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BINDING DOMAINS FROM PLASMODIUM VIVAX AMD 
PLASMODIUM FALCIPARUM ERYTHROCYTE BINDING PROTEINS 

BACKGROUND OF THE INVENTION . 
Malaria infects 200 • 400 million people each year causing 1-2 million deaths, thus remaining one 
5 of the most important infectious diseases in the world. Approximately 25 percent of all deaths of children in rural 
Africa between the ages of one and four years are caused by malaria. Due to the importance of the disease as a 
worldwide health problem, considerable effort is being expended to identify and develop malaria vaccines. 

Malaria in humans is caused by four species of the parasite Plasmodium: P. falciparum, P. vivax, 
P. knowlesi and P. mafariae. The major cause of malaria in humans is P. falciparum which infects 200 million to 
10 400 million people every year, killing 1 to 4 million. 

Duffy Antigen Binding Protein (DABP) and Sialic Acid Binding Protein (SABP) are soluble proteins 
that appear in the culture supernatant after infected erythrocytes release merozoites. Immunochemical data indicate 
that DABP and SABP which are the respective ligands for the P. vivax and P. falciparum Duffy and sialic scid 
receptors on erythrocytes, possess specificities of binding which are identical either in soluble or membrane bound 
15 form. 

DABP is a 135 kDa protein which binds specifically to Duffy blood group determinants (Wertheimer 
eta!, Exp. Paras'rtoL 69: 340-350 (1989); Barnwell, etal.,J. Exp. Med. 169: 1795-1802 (1989)). Thus, binding 
of DABP is specific to human Duffy positive erythrocytes. There are four major Duffy phenotypes for human 
erythrocytes: Fy(a), Fy(b), Fy(ab) and Fy(negative), as defined by the anti-Fy 8 and anti-Fy b sera (Hadtey et aL § In Red 

20 Cell Antigens and Antibodies, G. Garratty, ed. (Arlington, Va.:American Association of Blood Banks) pp. 17-33 (1986)). 
DABP binds equally to both Fy(a) and Fy(b) erythrocytes which are equally susceptible to invasion by P. vivar, but 
not to Fy(negative) erythrocytes. 

In the case of SABP, a 175kDa protein, binding is specific to the glycophorin sialic acid residues 
on erythrocytes (Camus and HaA\ey, Science 230:553-556 (1985); Orlandi, et aL, J. Cefl Biol. 1 16:901-909 11 992)). 

25 Thus, neuraminidase treatment (which cleaves off sialic acid residues) render erythrocytes immune to P. falciparum 
invasion. 

The specificities of binding and correlation to invasion by the parasite thus indicate that DABP 
and SABP are the proteins of P. vivax and P. falciparum which interact with sialic acids and the Duffy antigen on 
the erythrocyte. The genes encoding both proteins have been cloned and .the DNA and predicted protein sequences 

3D have been determined (B. Kim Lee Sim, et al t J. Cell Biol. Ill: 1877-1884 (1990); Fang, X.. et aL, MoL Biochem 
ParashoL 44: 125132 (1991)). 

Despite considerable research efforts worldwide, because of the complexity of the Plasmodium 
parasite and its interaction with its host, it has not been possible to discover a satisfactory solution for prevention 
or abatement of the blood stage of malaria. Because malaria is a such a large worldwide health problem, there is 

35 a need for methods that abate the impact of this disease. The present invention provides effective preventive and 
therapeutic measures against Plasmodium invasion. 
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SUMMARY OF THE INVENTION 
The present invention provides compositions comprising an isolated DABP binding domain 
polypeptides and/or isolated SABP binding domain polypeptides. The DABP binding domain polypeptides preferably 
comprise between about 200 and about 300 amino acid residues while the SABP binding domain polypeptides 
5 preferably comprises between about 200 and about 600 amino acid residues. A preferred DABP binding domain 
polypeptide has about 325 residues of the amino acid sequence found in SEQ ID N0:2. A preferred SABP binding 
domain polypeptide has about 616 residues of the amino acid sequence of SEQ ID N0:4, encoded by the DNA 
sequence of SEQ ID NO: 3. The preferred DABP binding domain and SABP binding domain include the cysteine-rich 
portions of the proteins shown in Figure 1. 

10 The present invention also includes pharmaceutical compositions comprising a pharmaceutical^ 

acceptable carrier and an isolated DABP binding domain polypeptide in an amount sufficient to induce a protective 
immune response to Plasmodium max merozoites in an organism. In addition, isolated SABP binding domain 
polypeptide in an amount sufficient to induce a protective immune response to Plasmodium falciparum may be added 
to the pharmaceutical composition. 

15 Also provided are pharmaceutical compositions comprising a pharmaceutical^ acceptable carrier 

and an isolated SABP binding domain polypeptide in an amount sufficient to induce a protective immune response 
to Plasmodium falciparum merozoites in an organism. In addition, isolated DABP binding domain polypeptide in an 
amount sufficient to induce a protective immune response to Plasmodium max may be added to the pharmaceutical 
composition. 

20 Isolated polynucleotides which encode a DABP binding domain polypeptides or SABP binding domain 

polypeptides are also disclosed. In addition, the present invention includes a recombinant cell comprising the 

polynucleotide encoding the DABP binding domain polypeptide. 

The current invention further includes methods of inducing a protective immune response to 

Plasmodium merozoites in a patient. The methods comprise administering to the patient an immunologically effective 
25 amount of a pharmaceutical composition comprising a pharmaceutical^ acceptable carrier and an isolated DABP 

binding domain polypeptide, an SABP binding domain polypeptide or a combination thereof. 

The present disclosure also provides DNA sequences from additional P. falciparum genes in the 

Duffy-binding like WBL) family that have regions conserved with the P. falciparum 175 kD and P. max 135 kD 

binding proteins. 

30 

DEFINITIONS 

As used herein a "DABP binding domain polypeptide" or a "SABP binding domain polypeptide" are 
polypeptides substantially identical (as defined below) to a sequence from the cysteine-rich, amino-terminal region of 
the Duffy antigen binding protein (DABP) or sialic acid binding protein (SABP), respectively. Such polypeptides are 
35 capable of binding either the Duffy antigen or sialic acid residues on glycophorin. In particular, DABP binding domain 
polypeptides consist of amino acid residues substantially similar to a sequence of SABP within a binding domain 
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containing the cysteinerich sequence shown in Figure 1. SABP binding domain polypeptides consist of residues 
substantially similar to a sequence of DABP within a binding domain containing the cysteine-rich sequence shown 
in Figure 1. 

The binding domain polypeptides encoded by the genes of the DBL family consist of those residues 
5 substantially identical to the sequence of the binding domains of DABP and SABP as defined above. The DBL family 
comprises sequences with substantial similarity to the conserved regions of the DABP and SABP. These include 
those sequences reported here as ebU (SEQ ID N0:5 and SEQ ID N0:6), E31a (SEQ ID N0:7 and SEQ ID N0:8), var- 
7 (SEQ. ID. N0:13 and SEQ. ID. N0:14, GenBank Accession No. L42636) and vart (SEQ. ID. N0:15 and SEQ ID 
NO: 16, GenBank Accession No. L40608). The sequence ebl-2, (SEQ ID N0:9 and SEQ ID N0:10) represents the 
10 binding domains of var-7, and Proj3 (SEQ ID NO: 11 and SEQ ID N0:12) is the binding domain of var I. The DBL 
family also includes two other members var-2 and var-3 (GenBank Accession No. L40609). 

The polypeptides of the invention can consist of the full length binding domain or a fragment 
thereof. Typically DABP binding domain polypeptides will consist of from about 50 to about 325 residues, preferably 
between about 75 and 300, more preferably between about 100 and about 250 residues. SABP binding domain 
15 polypeptides will consist of from about 50 to about 616 residues, preferably between about 75 and 300, more 
preferably between about 100 and about 250 residues. 

Particularly preferred polypeptides of the invention are those within the binding domain that are 
conserved between SABP and the DBL family. Residues within these conserved domains are shown in Figure 1, 
below. 

20 Two polynucleotides or polypeptides are said to be "identical" if the sequence of nucleotides or 

amino acid residues in the two sequences is the same when aligned for maximum correspondence. Optimal alignment 
of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. 
Math. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol Biol 48:443 (1970), 
by the search for similarity method of Pearson and Lipman Proc. Natl Acad. Sci fl/.SJJ 85: 2444 (1988), by 

25 computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics 
Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wl), or by inspection. The 
term "substantial identity" means that a polypeptide comprises a sequence that has at least 80% sequence identity, 
preferably 90%, more preferably 95% or more, compared to a reference sequence over a comparison window of 
about 20 residues to about 600 residues- typically about 50 to about 500 residues usually about 250 to 300 

30 residues. The values of percent identity are determined using the programs above. Particularly preferred peptides 
of the present invention comprise a sequence in which at least 70% of the cysteine residues conserved in DABP and 
SABP are present. Additionally, the peptide will comprise a sequence in which at least 50% of the tryptophan 
residues conserved in DABP and SABP are present. The term substantial similarity is also specifically defined here 
with respect to those amino acid residues found to be conserved between DABP, SABP and the sequences of the 

35 DBL family. These conserved amino acids consist prominently of tryptophan and cysteine residues conserved among 
all sequences reported here. In addition the conserved amino acid residues include phenylalanine residues which may 
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be substituted with tyrosine. These amino acid residues may be determined to be conserved after the sequences 
have been aligned using methods outlined above by someone skilled in the art. 

Another indication that polypeptide sequences are substantially identical is if one protein is 
immunologically reactive with antibodies raised against the other protein. Thus, the polypeptides of the invention 
5 include polypeptides immunologically reactive with antibodies raised against the SABP binding domain, the DABP 
binding domain or raised against the conserved regions of the DBL family. 

Another indication that nucleotide sequences are substantially identical is if two molecules hybridize 
to each other under stringent conditions. Stringent conditions are sequence dependent and will be different in 
different circumstances. Generally, stringent conditions are selected to be about 5° C lower than the thermal melting 
10 point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined 
ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, 
stringent conditions will be those in which the salt concentration is about 0.02 molar atpH 7 and the temperature 
is at least about 60°C. 

Nucleotide sequences are also substantially identical for purposes of this application when the 
polypeptides which they encode are substantially identical. Thus, where one nucleic acid sequence encodes 
essentially the same polypeptide as a second nucleic acid sequence, the two nucleic acid sequences are substantially 
identical, even rf they would not hybridize under stringent conditions due to silent substitutions permitted by the 
genetic code {see. Darnell et a/. (1990) Molecular Cell Biology, Second Edition Scientific American Books, W.H. 
Freeman and Company. New York, NY, for an explanation of codon degeneracy and the genetic code). 

The phrases "isolated" or "biologically pure" refer to material which is substantially or essentially 
free from components which normally accompany it as found in its native state. Thus, the binding domain 
polypeptides of this invention do not contain materials normally associated with their in situ environment, e.g.. other 
proteins from a merozoite membrane. Typically, isolated proteins of the invention are at least about 80% pure, 
usually at least about 90%, and preferably at least about 95% as measured by band intensity on a silver stained 
25- gel.-. 

Protein purity or homogenerty may be indicated by a number of means well known in the art, such 
as polyacrylamide gel electrophoresis of a protein sample, followed by visualization upon staining. For certain 
purposes high resolution will be needed and HPLC or a similar means for purrficatibn utilized. 

The term "residue" refers to an amino acid (D or L) or amino acid mimetic incorporated in a 
30 oligopeptide by an amide bond or amide bond mimetic. An amide bond mimetic of the invention includes peptide 
backbone modifications well known to those skilled in the an. 

BRIEF DESCRIPTION OF THE DRA WINES 



20 
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Figure 1 represents an alignment of the predicted ammo acid sequences of the DABP binding 
domain (Vivax) (SEQ ID N0:25), the two homologous SABP domains (SABP F1 (SEQ ID NCL:26) and SABP F2 (SEQ 
ID N0:27)) and the sequenced members of the DBL gene family (ebl-1 (SEQ ID N0:28), E31a (SEQ ID N0:29), EBL-2 
(SEQ ID N0:30)) and the three homologous Proj3 domains (F1 (SEQ ID N0:31), F2 (SEQ ID N0:32) and F3 (SEQ ID 
5 N0:33». 

Figure 2 represents a schematic of the pRE4 cloning vector. 

Figure 3 shows primers useful for isolating sequences encoding the conserved motifs of the 
invention. Primers UNIEBP5 (SEQ ID N0:35) and UNIEBP5A (SEQ ID N0:36) encode the amino acid sequence of SEQ 
ID N0:34; primers UNIEBP5B (SEQ ID N0:38) and UNIEBP5C (SEQ ID N0:39) encode the amino acid sequence of 

10 SEQ ID N0:37; primers UNIEBP3 (SEQ ID N0:41) and UNIEBP3A (SEQ ID N0:42) encode the amino acid sequence 
of SEQ ID N0:40; and primers UNIEBP3B (SEQ ID N0:44) and UNIEBP3C (SEQ ID N0:45) encode the amino acid 
sequence of SEQ ID N0:43. 

Figure 4 shows the relative position of the E31a ORF on chromosome 7. 

Figure 5 shows a map of a var gene cluster on chromosome 7. Relative positions of four YACs 

15 (PfYEF2, PfYFEB, PfYKF8, PfYED9) are indicated under the chromosome 7 line at the top of the figure. YACs PfYFEB 
and Pf YKF8 lie entirely within a segment linked to CQR in a genetic cross, whereas YACs PfYED9 and PfYEF2 extend 
beyond sites (identified by pE53a and pH270.5) that are dissociated from the chloroquine response. The var cluster 
extends over a region of 100-150 kb in PfYED9. Exons and introhs of the var-1, var 2 and var 3 genes within the 
sequenced 40 kb segment are represented by solid and dotted lines, respectively; arrows show the coding direction. 

20 Two more var elements outside of the sequenced region, identified by conserved restriction sites and cross- 
hybridization, are indicated by dashed lines \var-2c and var-3c). Bold letters mark repeated restriction sites that 
suggest a duplication in the varUvar-3 and var-2c/var-3c segments. Enzyme recognition sites: A, ApaY, B, Bgfi; C, 
ClaY. D. ///Mill; E, HaeWY, H, BsM\; K, KpnY, M, BamWY, P. HpaY, S, Smal /////dill and Hae\\\ sites outside of the 
sequenced region were not mapped. Positions and sizes of inserts from the Dd2 subsegment library are indicated: 

25 a, pE280b; b, pB20.3; c, pB600; d, pE21b; e, pB20.24; f, pE32b; h, pE241a; i, pE240a/51d; j, pE33a; k, pB20.23; 
I, /IL17BA6; m, pB20.26; n, pB20SU.27; 0, p15J2J3. Inserts from the PfYED9 34 kb Apa\Sma\ fragment library: 
r, pB3; s, p3G11; t, pJVs; u, p2E10; v, plG3; w, p2E3; x, p2B6; y, PE10; z, pJYr; a, pC5; ft, pi A3; y, p1F6; 6, 
p3C3; €, pA2; C p2A9; 0. p3C4; 6, pJZn; k, p3D8. 

DESCRIPTION OF THE PREFERRED EMBODIMENT 

30 The binding of merozoites and schizonts to erythrocytes is mediated by specific binding proteins 

on the surface of the merozoite or schizont and is necessary for erythrocyte invasion. In the case of P. falciparum, 
this binding involves specific interaction between sialic acid glycophorin residues on the erythrocyte and the sialic 
acid binding protein (SABP) on the surface of the merozoite or schizont. The ability of purified SABP to bind 
erythrocytes with chemically or enzymatically altered sialic acid residues paralleled the ability of £ falciparum to 

35 invade these erythrocytes. Furthermore, sialic acid deficient erythrocytes neither bind SABP nor support invasion 
by P. falciparum. The DNA encoding SABP from P. falciparum has also been cloned and sequenced. 
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In P. vivax, specific binding to the erythrocytes involves interaction between the Duffy blood group 
antigen on the erythrocyte and the Duffy antigen binding protein (DABP) on the merozoite. Duffy binding proteins 
were defined biologically as those soluble proteins that appear in the culture supernatant after the infected 
erythrocytes release merozoites which bind to human Duffy positive, but not to human Duffy negative erythrocytes. 
It has been shown that binding of the P. vivax DABP protein to Duffy positive erythrocytes is blocked by antisera 
to the Duffy blood group determinants. Purified Duffy blood group antigens also block the binding to erythrocytes. 
DABP has also been shown to bind Duffy blood group determinants on Western blots. 

Duffy positive blood group determinants on human erythrocytes are essential for invasion of human 
erythrocytes by Plasmodium vivax. Both attachment and reorientation of P. vivax merozoites occur equally well on 
Duffy positive and negative erythrocytes. A junction then forms between the apical end of the merozoite and the 
Duffy-positive erythrocyte, followed by vacuole formation and entry of the merozoite into the vacuole. Junction 
formation and merozoite entry into the erythrocyte do not occur on Duffy negative cells, suggesting that the receptor 
specific for the Duffy determinant is involved in apical junction formation but not initial attachment. The DNA 
sequences encoding the DABP from * vivax and P. knowlesihw been cloned and sequenced. 

P. vivax red cell invasion has an absolute requirement for the Duffy blood group antigen. Isolates 
of P. falciparum, however, vary in their dependency on sialic acid for invasion. Certain P. falciparum clones have 
been developed which invade sialic acid deficient erythrocytes at normal rates. This suggests that certain strains 
of P. falciparum can interact with other ligands on the erythrocyte and so may possess multiple erythrocyte binding 
proteins with differing specificities. 

A basis for the present invention is the discovery of the binding domains in both DABP and SABP. 
Comparison of the predicted protein sequences of DABP and SABP reveals an amino terminal, cysteine rich region 
in both proteins with a high degree of similarity between the two proteins. The ammo-terminal, cysteine rich region 
of DABP contains about 325 amino acids, whereas the ammo-terminal, cysteine rich region of SABP contains about 
616 amino acids. This is due to an apparent duplication of the amino-terminal. cysteine-rich region in the SABP 
protein. The cysteine residues are conserved between the two regions of SABP and DABP, as are the amino acids 
surrounding the cysteine residues and a number of aromatic amino acid residues in this region. The amino-terminal 
cysteine rich region and another cysteine rich region near the carboxyl-terminus show the most similarity between 
the DABP and SABP proteins. The region of the amino acid sequence between these two cysteine rich regions show 
only limited similarity between DABP and SABP. 

Other P. falciparum open reading frames and genes with regions that have substantial identity to 
binding domains of SABP and DABP have been identified. Multiple copies of these sequences exist in the parasite 
genome, indicating their important activity in host-parasite interactions. A family of these sequences {the DBL family) 
have been cloned from chromosome 7 subsegment libraries that were constructed during genetic studies of the 
chloroquine resistance locus (Wellems et. ah. PNAS 88: 3382-3386 (1991)). Certain of these transcripts are known 
to be from the var family of genes that modulate cytoadherence and antigenic variation of P. falciparum- infected 
erythrocytes (see. Example 3, below). 
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Genes of the P. falciparum var family encode 200*350 kD variant surface molecules that determine 
antigenic and adhesive properties of parasitized erythrocytes. The large repertoire of var genes (50-160 copies, 
having sufficient DNA to account for 2-6% of the haploid genome), the dramatic sequence variation among the gene 
copies, their variable expression in different parasite lines, the ready detection of DNA rearrangements, and the 
5 receptor binding features of the encoded extracellular domains all implicate var genes as the major determinants of 
antigenic variation and cytoadherence in P. falciparum malaria. 

A second class of DBL -encoding transcripts includes single-copy genes such as abH. Genetic 
linkage studies have placed this gene within a region of chromosome 13 that affects invasion of malarial parasites 
in human red blood cells (Wellems at al, fo7 49:633-642 (1987)). Both SABP and ebM show restriction patterns 
10 that are well conserved among different parasite isolates. This conservation of gene structure and the sequence 
relationships between the ebl-1 and SABP domains suggest that ebU encodes a novel erythrocyte binding molecule 
having receptor properties distinct from those of SABP. 

Southern hybridization experiments using probes from these open reading frames have indicated 
that additional copies of these conserved sequences are located elsewhere in the genome. The largest of the open 
15 reading frames on chromosome 7 is 8 kilobases and contains four tandem repeats homologous to the N-terminal, 
cysteine-rich unit of SABP and DABP. 

Figure 1 represents an alignment of the DBL family with the DABP binding domain and two 
homologous regions of SABP (F 1 and F 2 ). The DBL family is divided into two sub-families to achieve optimal 
alignment. Conserved cysteine residues are shown in bold face and conserved aromatic residues are underlined. 
20 The polypeptides of the invention can be used to raise monoclonal antibodies specific for the 

binding domains of SABP, DABP or the conserved regions in the DBL gene family. The antibodies can be used for 
diagnosis of malarial infection or as therapeutic agents to inhibit binding of merozoites to erythrocytes. The 
production of monoclonal antibodies against a desired antigen is well known to those of skill in the art and is not 
reviewed in detail here. 

25 The multitude of techniques available to those skilled in the art for production and manipulation 

of various immunoglobulin molecules can thus be readily applied to inhibit binding. As used herein, the terms 
"immunoglobulin* 1 and "antibody" refer to a protein consisting of one or more polypeptides substantially encoded by 
immunoglobulin genes. Immunoglobulins may exist in a variety of forms besides antibodies, including for example, 
Fv, Fab, and Ffab^, as well as in single chains. For a general review of immunoglobulin structure and function see, 

30 Fundamental Immunology, 2d Ed., W.E. Paul ed., Ravens Press, N.Y., (1989). 

Antibodies which bind polypeptides of the invention may be produced by a variety of means. The 
production of non-human monoclonal antibodies, e.g., murine, lagomorpha, equine, etc., is well known and may be 
accomplished by, for example, immunizing the animal with a preparation containing the polypeptide. 
Antibody-producing cells obtained from the immunized animals are immortalized and screened, or screened first for 

35 the production of antibody which inhibits binding between and meroxoites and erythrocytes and then immortalized. 
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For a discussion of general procedures of monoclonal antibody production see Harlow and Lane, Antibodies, A 
Laboratory Manual Cold Spring Harbor Publications, N,Y. (1988). 

Thus, the present invention allows targeting of protective immune responses or monoclonal 
antibodies to sequences in the binding domains that are conserved between SABP, DABP and encoded regions of the 
5 DBL family. Identification of the binding regions of these proteins facilitates vaccine development because it allows 
for a focus of effort upon the functional elements of the large molecules. The particular sequences within the 
binding regions refine the target to critical regions that have been conserved during evolution, and are thus preferred 
for use as vaccines against the parasite. 

The genes of the DBL family (which have not previously been sequenced) can be used as markers 
10 to detect the presence of the P. falciparum parasite in patients. This can be accomplished by means well known 
to practitioners in the art using tissue or blood from symptomatic patients in PCR reactions with oligonucleotides 
complementary to portions of the genes of the DBL family. Furthermore, sequencing the DBL family provides a 
means for skilled practitioners to generate defined probes to be used as genetic markers in a variety of applications. 

Additionally, the present invention defines a conserved motif present in, but not restricted to other 
15 members of the subphylum Apicomplexa which participates in host parasite interaction. This motif can be identified 
in Plasmodium species and other parasitic protozoa by the polymerase chain reaction using the synthetic 
oligonucleotide primers shown in Figure 3. PCR methods are described in detail below. These primers are designed 
from regions in the conserved motif showing the highest degree of conservation among DABP, SABP and the DBL 
family. Figure 3 shows these regions and the consensus amino acid sequences derived from them. 
20 A. General Methods 

Much of the nomenclature and general laboratory procedures required in this application can be 
found in Sambrook, etaL, Molecular Cloning A Laboratory Manual, 2nd Ed., Vol. 1-3, Cold Spring Harbor Laboratory, 
Cold Spring Harbor, NY, 1989. The manual is hereinafter referred to as "Sambrook, at aL, 1989." 

The practice of this invention involves the construction of recombinant nucleic acids and the 
25 expression of genes in transf ected cells. Molecular cloning techniques to achieve these ends are known in the art. 
A wide variety of cloning and in vitro amplification methods suitable for the construction of recombinant nucleic acids 
are well-known to persons of skill. Examples of these techniques and instructions sufficient to direct persons of skill 
through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods 
in Enzymology volume 1 52 Academic Press, Inc., San Diego, CA (Berger); and Current Protocols in Molecular Biology, 
30 F.M. Ausubel etaL, eds.. Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John 
Wiley & Sons, Inc., (1994 Supplement) (Ausubel). 

Examples of techniques sufficient to direct persons of skill through in vitro amplification methods, 
including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), Qtf-replicase amplification and other 
RNA polymerase mediated techniques are found in Berger, Sambrook et al., 1989, and Ausubel, as well as Mullis 
35 et aL, (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds). 
Academic Press Inc., San Diego, CA, 1990) ("Innis"); Arnheim & Levinson (October 1, 1990) C&EN 36-47; The 
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Journal Of NIH Research (1991) 3, 81-94; Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173; Guatelli et a/. 

(1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. CBn. Chem 35, 1826; Landegren et al., (1988) 
Science 241, 1077-1080; Van Brunt (1990) Biotechnology % 291-294; Wu and Wallace, (1989) Gene 4, 560; and 
Barringer et al (1990) Gene 89, 117. Improved methods of cloning in vitro amplified nucleic acids are described 

5 in Wallace et al., U.S. Pat. No. 5,426,039. 

The culture of cells used in the present invention, including cell lines and cultured cells from tissue 
or blood samples is well known in the art. Freshney [Culture of Animal Cells, a Manual of Basic Technique, third 
ed., Wiley-Liss, New York, NY (1994)) and the references cited therein provides a general guide to the culture of 
cells. 

10 DBL genes are optionally bound by antibodies in one of the embodiments of the present invention. 

Methods of producing polyclonal and monoclonal antibodies are known to those of skill in the art. See, e.g., Coligan 

(1991) Current Protocols in Immunology Wiley/Greene, NY; and Harlow and Lane (1989) Antibodies: A Laboratory 
Manual Cold Spring Harbor Press, NY; Stites et al (eds.) Basic and Clinical Immunology (4th ed.) Lange Medical 
Publications, Los Altos, CA, and references cited therein; Goding (1986) Monoclonal Antibodies: Principles and 

15 Practice (2d ed.) Academic Press, New York, NY; and Kohler and Milstein (1975) Nature 256: 495497. Other 
suitable techniques for antibody preparation include selection of libraries of recombinant antibodies in phage or similar 
vectors. See, Huse et al. (1989) Science 246: 1275-1281; and Ward, et al. (1989) Nature 341: 544-546. Specific 
Monoclonal and polyclonal antibodies will usually bind with a KD of at least about .1 mM, more usually at least 
about 1 jjfA, and most preferably at least about .1 //M or better. 

20 B. Methods for isolating ONA encoding SABP, DABP and DBL binding regions 

The nucleic acid compositions of this invention, whether RNA, cONA, genomic ONA, or a hybrid 
of the various combinations, may be isolated from natural sources or may be synthesized in vitro. The nucleic acids 
claimed may be present in transformed or transfected whole cells, in a transformed or transfected cell lysate, or in 
a partially purified or substantially pure form. 

25 Techniques for nucleic acid manipulation of genes encoding the binding domains of the invention, 

such as subcloning nucleic acid sequences encoding polypeptides into expression vectors, labelling probes, DNA 
hybridization, and the like are described generally in Sambrook et al, 1989. 

Recombinant DNA techniques can be used to produce the binding domain polypeptides. In general, 
the DNA encoding the SABP and DABP binding domains are first cloned or isolated in a form suitable for ligation 

30 into an expression vector. After ligation, the vectors containing the DNA fragments or inserts are introduced into 
a suitable host cell for expression of the recombinant binding domains. The polypeptides are then isolated from the 
host cells. 

There are various methods of isolating the DNA sequences encoding the SABP, DABP and DBL 
binding domains. Typically, the DNA is isolated from a genomic or cDNA library using labelled oligonucleotide probes 
35 specific for sequences in the DNA. Restriction endonuclease digestion of genomic DNA or cDNA containing the 
appropriate genes can be used to isolate the DNA encoding the binding domains of these proteins. Since the DNA 
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sequences of the SABP and DABP genes are known, a pane, of restriction endonuCeases can be constructed to *. 
cleavage of the DNA in the desired regions. After restriction endonuclease digestion. DMA encoding SABP binding 
domain or DABP binding domain is identified by its abOity to hybridize with nucleic acid probes, for example on 
Southern blots, and these DNA regions are isolated by standard methods familiar to those of skill in the art See 
Sambrook. eta/., 1989. ' 

The polymerase chain reaction can also be used to prepare DABP. SABP DBL binding domain DNA 
Polymerase chain reaction technology (PGR, is used to ampfify nucleic acid sequences of the DABP and SABP binding 
domains direct,y from mRNA. from cDNA. and from genomic Varies or cDNA libraries. The primers shown in Ffcure 
3 are particularly preferred for this process. 

Appropriate primers and probes for amplifying the SABP and DABP binding region DNA's are 
generated from analysis of the DNA sequences. In brief, oligonucleotide primers complementary to the two 3' borders 
of the DNA region to be amplified are synthesized. The polymerase chain reaction is then carried out using the two 
pnmers. See PCR Protocols: A Guide to Afetno* an* Action, (Innis, M, Gelfand. D.. Sninsky, J. and White 
T, (eds.). Academic Press, San Diego. CA (1990). Pnmers can be selected to amplify the entire DABP regions or 
to amplify smaller segments of the DABP and SABP binding domains, as desired. 

Oligonucleotides for use as probes are chemically synthesized according to the solid phase 
Phosphoramidite triester method first described by Beaucage, S.L. and Caruthers, M.H.. 1981, Tetrahedron Letts 
22|20):1859.1862 using an automated synthesizer, as described in Needham-VanDevanter, D.R.. eta,. 1984 Nucleic 
Acds Res.. 12:6159-6168. Purification of oOgonudeotides is by either native acrylamide gel electrophoresis or by 
amon-exchange HPLC as described in Pearson. J.D. and Regnier. F.E.. 1983, J. Chrom.. 255:137149. 

The sequence of the synthetic oligonucleotides can be verified using the chemical degradation 
method of Maxam. A.M. and Gilbert. 1980, in W.. Grossman. L. and Moldave, D., eds. Academic Press. New York. 
NY, Methods in Enzymology 65:499-560. 

Other methods known to those of skill in the art may also be used to isolate DNA encoding all 
or part of the SABP or DABP binding domains. See Sambrook, et a/., 1989. 

C V Expression of DABP. SABP and nm n indinn Dnmain p n , y rn[>t;Ha , 
Once binding domain DNAs are isolated and cloned, one may express the desired polypeptides in 
a recomb.nantly engineered cel. such as bacteria, yeast, insect (especially employing baculovira. vectors), and 
mammalian cells. It is expected that those of skill in the art are knowledgeable in the numerous expression systems 
avertable for expression of the DNA encoding the DABP and SABP binding domains. No attempt to describe in detail 
the vanous methods known for the expression of proteins in prokaryotes or eukaryotes will be made. 

In brief summary, the expression of natural or synthetic nucleic acids encoding binding domains 
w.0 typically be achieved by operably finking the DNA or cDNA to a promoter (which is either constrtutive or 
-nducble). followed by incorporation into an expression vector. The vectors can be suitable for replication and 
mtegratron in either prokaryotes or eukaryotes. Typical expression vectors contain transcription and translation 
termmators, initiation sequences, and promoters useful for regulation of the expression of the DNA encoding the 
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binding domains. To obtain high level expression of a cloned gene, it is desirable to construct expression plasmids 
which contain, at the minimum, a strong promoter to direct transcription, a ribosome binding site for translational 
initiation, and a transcription/translation terminator. 

1. Expression in Prokarvotes 

5 Examples of regulatory regions suitable for this purpose in £ coli are the promoter and operator 

region of the £ coli tryptophan biosynthetic pathway as described by Yanofsky, C, 1984, J. Bacterial., 
158:1018-1024 and the leftward promoter of phage lambda (P L ) as described by Herskowitz, I. and Hagen, D., 1980, 
Ann. Rev. Genet., 14:399445. The inclusion of selection markers in DNA vectors transformed in £ coff is also 
useful. Examples of such markers include genes specifying resistance to ampicillin, tetracycline, or chloramphenicol. 
10 See Sambrook et al, 1989, for details concerning selection markers for use in £ coff. 

The vector is selected to allow introduction into the appropriate host cell. Bacterial vectors are 
typically of plasmid or phage origin. Appropriate bacterial cells are infected with phage vector particles or 
transfected with naked phage vector DNA. If a plasmid vector is used, the bacterial cells are transf ected wrth the 
plasmid vector DNA. 

15 Expression systems for expressing the DABP and SABP binding domains are available using £ coff, 

Bacillus sp. (Palva, I et al., 1983, Gene 22:229-235; Mosbach, K. et al. Nature, 302:543-545 and Salmonella. £ 

coff systems are preferred. 

The binding domain polypeptides produced by prokaryote cells may not necessarily fold properly. 

During purification from £ coff, the expressed polypeptides may first be denatured and then renatured. This can be 
20 accomplished by solubilizing the bacterially produced proteins in a chaotropic agent such as guanidine HCI and 

reducing all the cysteine residues with a reducing agent such as beta-mercaptoethanol. The polypeptides are then 

renatured, either by slow dialysis or by gel filtration. U.S. Patent No. 4,51 1,503. 

Detection of the expressed antigen is achieved by methods known in the art as radioimmunoassays, 

Western blotting techniques or immunoprecipitation. Purification from £ coli tan be achieved following procedures 
25 described in U.S. Patent No. 4,511,503. 

2. Synthesis of SABP. DABP and DBL Binding Domains in Eukarvotes 

A variety of eukaryotic expression systems such as yeast, insect cell lines and mammalian cells, 
are known to those of skill in the art. As explained briefly below, the DABP and SABP binding domains may also 
be expressed in these eukaryotic systems. 
30 a. Expression m Yeast 

Synthesis of heterologous proteins in yeast is well known and described. Methods in Yeast 
Genetics, Sherman, F., et al., Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various 
methods available to produce the binding domains in yeast. 

Examples of promoters for use in yeast include GAL1,10 (Johnson, M., and Davies, R.W., 1984, 
35 Mol. and Cell. Biol. 4:1440-1448) ADH2 (Russell, D., et aL 1983, J. Biol. Chem., 258:2674-2682), PH05 (EMBO 
J. 6:675-680, 1982), and MFoi (Herskowitz, I. and Oshima, Y., 1982, in The Molecular Biology of the Yeast 
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Saccharomyces. (eds. Strathern, J.N. Jones, E.W., and Broach. J.R., Cold Spring Harbor Lab., Cold Spring Harbor. 
N.Y., pp. 181-209. A multicopy plasmid with a selective marker such as Leu-2, URA-3, Trp-1. and His-3 is also 
desirable. 

A number of yeast expression plasmids like YEpB, YEp13, YEp4 can be used as vectors A gene 
5 of interest can be fused to any of the promoters in various yeast vectors. The above-mentioned plasmids have been 
fully described in the literature (Botstein. ,Z, 1979. Gene, 8:17-24; Broach. 1979. Bene, 8:121-133). 

Two procedures are used in transforming yeast cells. In one case, yeast cells are first converted 
into protoplasts using zymolyase. lyticase or glusulase, followed by addition of DMA and polyethylene glycol (PEG) 
The PEG-treated protoplasts are then regenerated in a 3% agar medium under selective conditions. Details of this 
10 procedure are given in the papers by J.D. Beggs, 1978, Nature (London), 275:104-109; and Hinnen, A., et a, 1978 
Proc. Natl. Acad. Sci. USA. 75:1929-1933. The second procedure does not involve removal of the cell wall. Instead 
the cells are treated with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et a/., 1983 J 
Bact., 153:163-168). 

The binding domains can be isolated from yeast by lysing the cells and applying standard protein 
isolation techniques to the lysates. The monitoring of the purification process can be accomplished by using Western 
blot techniques or radioimmunoassays of other standard immunoassay techniques. 

D - Expression in Mammalian a nd Insect Cull friltiimo 
Illustrative of cell cultures useful for the production of the binding domains are cells of insect or 
mammalian origin. Mammalian cell systems often will be in the form of monolayers of cells although mammalian cell 
suspensions may also be used. Illustrative examples of mammalian cell lines include VERO and HeLa cells. Chinese 
hamster ovary (CHO) cell lines, W138, BHK. Cos-7 or MOCK cell lines. 

As indicated above, the vector, e. g., a plasmid, which is used to transform the host cell, 
preferably contains ONA sequences to initiate transcription and sequences to control the translation of the antigen 
gene sequence. These sequences are referred to as expression control sequences. When the host cell is of insect 
or mammalian origin illustrative expression control sequences are obtained from the SV-40 promoter (Science, 
222:524-527. 1983). the CMV I.E. Promoter (Proc. Natl. Acad. Sci. 81:659-663. 1984) or the metallothionein 
promoter (Nature 296:3942, 1982). The cloning vector containing the expression control sequences is cleaved using 
restriction enzymes and adjusted in size as necessary or desirable and ligated with ONA coding for the SABP or 
DABP polypeptides by means well known in the art. 

As with yeast, when higher animal host cells are employed, polyadenlyation or transcription 
terminator sequences from known mammalian genes need to be incorporated into the vector. An example of a 
terminator sequence is the polyadenlyation sequence from the bovine growth hormone gene. Sequences for accurate 
splicing of the transcript may also be included. An example of a splicing sequence is the VPI intron from SV40 
(Sprague. J. et a/., 1983, J. Virol. 45: 773-781). 

Additionally, gene sequences to control replication in the host cell may be incorporated into the 
vector such as those found in bovine papilloma virus type-vectors. Saveria-Campo, M., 1985, "Bovine Papilloma virus 
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DNA a Eukaryotic Cloning Vector" in DNA Cloning Vol. II a Practical Approach Ed. D.M. Glover, IRL Press, Arlington, 
Virginia pp. 213-238. 

The host cells are competent or rendered competent for transformation by various means. There 
are several well-known methods of introducing DNA into animal cells. These include: calcium phosphate precipitation, 
5 fusion of the recipient cells with bacterial protoplasts containing the DNA, treatment of the recipient cells with 
liposomes containing the DNA, DEAE dextran, electroporation and micro-injection of the DNA directly into the cells. 

The transformed cells are cultured by means well known'in the art. Biochemical Methods in Cell 
Culture and Virology. Kuchler, R.J., Dowden, Hutchinson and Ross, Inc., (1977). The expressed DABP and SABP 
binding domain polypeptides are isolated from cells grown as suspensions or as monolayers. The latter are recovered 
10 by well known mechanical, chemical or enzymatic means. 

c. Expression in recombinant vaccinia virus- or adenovirus-infected cells 
In addition to use in recombinant expression systems, the isolated binding domain DNA sequences 
can also be used to transform viruses that transfect host cells in the patient. Live attenuated viruses, such as 
vaccinia or adenovirus, are convenient alternatives for vaccines because they are inexpensive to produce and are 
15 easily transported and administered. Vaccinia vectors and methods useful in immunization protocols are described, 
for example, in U.S. Patent No. 4,722,848. 

Suitable viruses for use in the present invention include, but are not limited to, pox viruses, such 
as canarypox and cowpox viruses, and vaccinia viruses, alpha viruses, adenoviruses, and other animal viruses. The 
recombinant viruses can be produced by methods well known in the art, for example, using homologous recombination 
20 or ligatmg two plasmids. A recombinant canarypox or cowpox virus can be made, for example, by inserting the 
DNA's encoding the DABP and SABP binding domain polypeptides into plasmids so that they are flanked by viral 
sequences on both sides. The DNA's encoding the binding domains are then inserted into the virus genome through 
homologous recombination. 

A recombinant adenovirus can be produced, for example, by ligating together two plasmids each 
25 containing about 50% of the viral sequence and the DNA sequence encoding erythrocyte binding domain polypeptide. 
Recombinant RNA viruses such as the alpha virus can be made via a cDNA intermediate using methods known in 
the art. 

In the case of vaccinia virus (for example, strain WR), the DNA sequence encoding the binding 
domains can be inserted in the genome by a number of methods including homologous recombination using a transfer 
30 vector, pTKgpt-OFIS as described in Kaslow, et aL, Science 252:1310-1313 (1991). 

Alternately the DNA encoding the SABP and DABP binding domains may be inserted into another 
plasmid designed for producing recombinant vaccinia, such as pGS62, Langford, C.L., et aL, 1986, Mol. Cell. Biol 
6:3191-3199. This plasmid consists of a cloning site for insertion of foreign genes, the P7.5 promoter of vaccinia 
to direct synthesis of the inserted gene, and the vaccinia TK gene flanking both ends of the foreign gene. 
35 Confirmation of production of recombinant virus can be achieved by DNA hybridization using cDNA 

encoding the DABP and SABP binding domain polypeptides and by immunodetection techniques using antibodies 
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specific for the expressed binding domain polypeptides. Virus stocks may be prepared by infection of cells such as 
HELA S3 spinner cells and harvesting of virus progeny. 

The recombinant virus of the present invention can be used to induce anti-SABP and anti-DABP 
binding domain antibodies in mammals, such as mice or humans. In addition, the recombinant virus can be used to 
produce the SABP and DABP binding domains by infecting host cells m vitro, which in turn express the polypeptide 
(see section on expression of SABP and DABP binding domains in eukaryotic cells, above). 

The present invention also relates to host cells infected with the recombinant virus. The host cells 
of the present invention are preferably mammalian, such as BSC-1 cells. Host cells infected with the recombinant 
virus express the DABP and SABP binding domains on their cell surfaces. In addition, membrane extracts of the 
infected cells induce protective antibodies when used to inoculate or boost previously inoculated mammals. 
D. Purification of the SABP. PjARP and DBL Binding Domain Pnh/nantMps 
The binding domain polypeptides produced by recombinant DNA technology may be purified by 
standard techniques welt known to those of skill in the art. Recombinant^ produced binding domain polypeptides 
can be directly expressed or expressed as a fusion protein. The protein is then purified by a combination of cell lysis 
(e. g.. sonication) and affinity chromatography. For fusion products, subsequent digestion of the fusion protein with 
an appropriate proteolytic enzyme release the desired SABP and DABP binding domains. 

The polypeptides of this invention may be purified to substantial purity by standard techniques 
well known in the art. including selective precipitation with such substances as ammonium sulfate, column 
chromatography, immunopurification methods, and others. See, for instance. R. Scopes. Protein Purification: 
!0 Principles and Practice, Springer-Verlag. New York, NY (1982). 

E. Production of Binding Do mains bv protein chemistry techniq np« 
The polypeptides of the invention can be synthetically prepared in a wide variety of ways. For 
instance polypeptides of relatively short size, can be synthesized in solution or on a solid support in accordance with 
conventional techniques. Various automatic synthesizers are commercially available and can be used in accordance 
with known protocols. See/for example. Stewart and Young, Solid Phase Peptide Synthesis, 2d. ed> Pierce Chemical 
Co. (1984). 

Alternatively, purified and isolated SABP. DABP or DBL family proteins may be treated with 
proteolytic enzymes in order to produce the binding domain polypeptides. For example, recombinant DABP and SABP 
proteins may be used for this purpose. The DABP and SABP protein sequence may then be analyzed to select 
proteolytic enzymes to be used to generate polypeptides containing desired regions of the DABP and SABP binding 
domain. The desired polypeptides are then purified by using standard techniques for protein and peptide purification. 
For a review of standard techniques see. Methods in Emymology, "Guide to Protein Purification", M. Deutscher. ed 
Vol. 182 (1990), pages 61 9-626. 

F. Modificatio n of nucleic acid and polypeptide senuenras 

The nucleotide sequences used to transfect the host cells used for production of recombinant 
binding domain polypeptides can be modified according to standard techniques to yield binding domain polypeptides, 
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with a variety of desired properties. The binding domain polypeptides of the present invention can be readily 
designed and manufactured utilizing various recombinant DNA techniques well known to those skilled in the art. For 
example, the binding domain polypeptides can vary from the naturally-occurring sequence at the primary structure 
level by amino acid insertions, substitutions, deletions, and the like. These modifications can be used in a number 
5 of combinations to produce the final modified protein chain. 

The amino acid sequence variants can be prepared with various objectives in mind, including 
facilitating purification and preparation of the recombinant polypeptides. The modified polypeptides are also useful 
for modifying plasma half -life, improving therapeutic efficacy, and lessening the severity or occurrence of side effects 
during therapeutic use. The amino acid sequence variants are usually predetermined variants not found in nature but 

10 exhibit the same immunogenic activity as naturally occurring polypeptides. For instance, polypeptide fragments 
comprising only a portion (usually at least about 60-80%, typically 90-95%) of the primary structure may be 
produced. For use as vaccines, polypeptide fragments are typically preferred so long as at least one epitope capable 
of eliciting production of blocking antibodies remains. 

In general, modifications of the sequences encoding the binding domain polypeptides may be readily 

15 accomplished by a variety of well-known techniques, such as site-directed mutagenesis (see, Giliman and Smith, Gene 
8:81-97 (1979) and Roberts, S. et al., Nature 328:731734 (1987)). One of ordinary skill will appreciate that the 
effect of many mutations is difficult to predict. Thus, most modifications are evaluated by routine screening in a 
suitable assay for the desired characteristic. For instance, changes in the immunological character of the polypeptide 
can be detected by an appropriate competitive binding assay. Modifications of other properties such as redox or 

20 thermal stability, hydrophobicity, susceptibility to proteolysis, or the tendency to aggregate are all assayed according 
to standard techniques. 

G. Diagnostic and Screening Assays 

The polypeptides and nucelic acids of the invention can be used in diagnostic applications for the 
detection of merozoites or nucleic acids in a biological sample. The presence of parasites can be detected using 

25 several well recognized specific binding assays based on immunological results. (See U.S. Patents 4,366,241; 
4,376,1 10; 4,517,288; and 4,837,168). For instance, labeled monoclonal antibodies to polypeptides of the invention 
can be used to detect merozoites in a biological sample. Alternatively, labelled polypeptides of the invention can be 
used to detect the presence of antibodies to SABP or DABP in a biological sample. For a review of the general 
procedures in diagnostic immunoassays, see also Basic and Clinical Immunology 7th Edition (D. Stites and A. Terr 

30 ed.) 1991. 

In addition, modified polypeptides, antibodies or other compounds capable of inhibiting the 
interaction between SABP or DABP and erythrocytes can be assayed for biological activity. For instance, 
polypeptides can be recombinant^ expressed on the surface of cells and the ability of the cells to bind erythrocytes 
can be measured as described below. Alternatively, peptides or antibodies can tested for the ability to inhibit binding 
35 between erythrocytes and merozoites or SABP and DABP. 
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Cell-free assays can also be used to measure binding of DA BP or SABP polypeptides to isolated Duffy 
antigen or glycophorin polypeptides. For instance, the erythrocyte proteins can be immobilized on a solid surface and 
binding of labelled SABP or DABP polypeptides can be measured; 

Many assay formats employ labelled assay components. The labelling systems can be in a variety of forms. 
5, The label may be coupled directly or indirectly to the desired component of the assay according to methods wed 
known in the art. A wide variety of labels may be used. The component may be labelled by any one of several 
methods. The most common method of detection is the use of autoradiography with 3 H, 125 l, 35 S, 14 C, or M P 
labelled compounds or the like. Non radioactive labels include ligands which bind to labelled antibodies, fluorophores. 
chemiluminescent agents, enzymes, and antibodies which can serve as specific binding pair members for a labelled 
ligand. The choice of label depends on sensitivity required, ease of conjugation with the compound, stability 
requirements, and available instrumentation. 

In addition, the polypeptides of the invention can be assayed using animal models, wen known to those 
of skill in the art. For P falciparum the in vivo models include Aotussp. monkeys or chimpanzees; f or>. max the 
in vivo models include Saimiri monkeys. 

In the case of the use nucleic acids for diagnostic purposes, standard nucleic hybridization 
techniques can be used to detect the presence of the genes identified here ie.g., members of the DBL family). If 
desired, nucleic acids in the sample may first be amplified using standard procedures such as PCR. Diagnostic kits 
comprising the appropriate primers and probes can also be prepared. 
H. DBL Targeted Thereneutics 

DBL polypeptides are expressed on the surface of /y«/no<*tfm infected erythrocytes. As such, they 
present ideal targets for therapeutics which target infected erythrocytes. In one preferred embodiement of the 
present invention, cytotoxic antibodies or antibody fusion proteins with cytotoxic agents are targeted against DBL 
proteins, killing infected erythrocytes and inhibiting the reproduciton of Plasmodium in an infected host. 

The procedure for attaching a cytotoxic agent to an antibody win vary according to the chemical 
structure of the agent. Antibodies and cytotoxic agents are typically bound together chemically or. where the 
antibody and cytotoxic agents are both polypeptides, are optionally synthesized recombinant^ as a fusion protein. 
Polypeptides typicaBy contain variety of functional groups; e.g., carboxylic acid (COOH) or free amine (NH 2 ) groups, 
which are available for reaction with a suitable functional group on either the antibody or the cytotoxic agent. 

Alternatively, antibodies or cytotoxic agents are derivftized to attach additional reactive functional 
) groups. The derealization optionally involves attachment of linker molecules such as those available from Pierce 
Chemical Company, Rockford Illinois. A "linker", as used herein, is a molecule that is used to join the nucleic acid 
binding molecule to the receptor ligand. The linker is capable of forming covalent bonds to both the antibody and 
the cytotoxic agent. Suitable linkers are well known to those of skill in the art and include, but are not limited to. 
straight or branched-chain carbon linkers, heterocyclic carbon linkers, or peptide linkers. Where the antibody and the 
cytotoxic agent are polypeptides, the linkers are joined to the constituent amino acids through their side groups kg., 
through a disulfide linkage to cysteine) or to the alpha carbon amino and carboxyl groups of the terminal amino acids. 
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A bifunctional linker having one functional group reactive with a group on a particular ligand, and 
another group reactive with a nucleic acid binding molecule, can be used to form the desired conjugate. Alternatively, 
derivatization can proceed through chemical treatment of the ligand or nucleic acid binding molecule, e.g., glycol 
cleavage of the sugar moiety of a glycoprotein with periodate to generate free aldehyde groups. The free aldehyde 
5 groups on the glycoprotein may be reacted with free amine or hydrazine groups on an agent to bind the agent thereto 
{See, e.g„ U.S. Patent No. 4,671,958). Procedures for generation of free sulfhydryl groups on polypeptides, are 
known {See, e.g., U.S. Pat. No. 4,659,839). 

Many procedures and linker molecules for attachment of various compounds to proteins are known. 
See, for example, European Patent Application No. 188,256; U.S. Patent Nos. 4,671,958, 4,659,839, 4,414,148, 
10 4,699,784; 4,680,338; 4,569,789; and 4,589,071; and Borlinghaus et al. Cancer Res. 47: 40714075 (1987). In 
particular, production of various antibody conjugates is well-known within the art and can be found, for example in 
Thorpe et al.. Monoclonal Antibodies in Clinical Medicine, Academic Press, pp. 168-190 (1982), Waldmann, Science, 
252: 1657 (1991), and U.S. Patent Nos. 4,545,985 and 4,894,443. 

A number of antibodies which bind cell surface receptors have been converted to form suitable 
15 for incorporation into fusion proteins, and similar strategies are used to create fusion-protein antibodies which bind 
DBR polypeptides, see Batra et at. Mot. Cell. Biol., 1 1: 2200-2205 (1991); Batra et a/., Proc. Natl. Acad. Sci. USA, 
89: 5867-5871 (1992); Brinkmann, et al. Proc. Natl. Acad. Sci. USA, 88: 8616-8620 (1991); Brinkmann et al., Proc. 
Natl. Acad. Sci. USA, 90: 547-551 (1993); Chaudhary et al., Proc. Natl. Acad. Sci. USA, 87: 1066-1070 (1990); 
Friedman et al.. Cancer Res. 53: 334-339 (1993); Kreitman et al., J. Immunol., 149: 2810-2815 (1992); Nicholls 
20 et al., J. Biol. Chem., 268: 5302-5308 (1993); and Wells, et al, Cancer Res., 52: 6310-6317 (1992), respectively). 

B. Production of Fusion Proteins 

Where the antibody fragment and/or the cytotoxic agents are relatively short polypeptides {i.e., 
less than about 50 amino acids) they are often synthesized using standard chemical peptide synthesis techniques. 
Where both molecules are relatively short, a chimeric molecule is optionally synthesized as a single contiguous 
25 polypeptide. Alternatively, the ligand and the nucleic acid binding molecule can be synthesized separately and then 
fused chemically. 

Solid phase synthesis in which the C-terminal amino acid of the sequence is attached to an 
insoluble support followed by sequential addition of the remaining amino acids in the sequence is a preferred method 
for the chemical synthesis of the ligands of this invention. Techniques for solid phase synthesis are described by 

30 Barany and Merrif ield, Solid-Phase Peptide Synthesis; pp. 3-284 in The Peptides: Analysis, Synthesis, Biology. Vol. 
2: Special Methods in Peptide Synthesis, Part A., Merrif ield, et al., J. Am. Chem. Soc, 85: 2149-2156 (1963), and 
Stewart et aL Solid Phase Peptide Synthesis, 2nd ed. Pierce Chem. Co., Rockford, III. (1984). 

In a preferred embodiment, the fusion molecules of the invention are synthesized using recombinant 
nucleic acid methodology. Generally this involves creating a nucleic acid sequence that encodes the receptor-targeted 

35 fusion molecule, placing the nucleic acid in an expression cassette under the control of a particular promoter, 
expressing the protein in a host, isolating the expressed protein and, if required, renaturing the protein. Techniques 
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sufficient to guide one of skill through such procedures are found in, e.g., Berger, Sambrook, Ausubel, Innis, and 
Freshney (all supra). 

While the two molecules are often joined directly together, one of skill will appreciate that the 
molecules may be separated by a peptide spacer consisting of one or more amino acids. Generally the spacer will 
5 have no specific biological activity other than to join the proteins or to preserve some minimum distance or other 
spatial relationship between them. However, the constituent amino acids of the spacer may be selected to influence 
some property of the molecule such as the folding, net charge, or hydrophobictty. 

Once expressed, recombinant fusion proteins can be purified according to standard procedures, 
including ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the ike 
10 {see, generally, R. Scopes, Protein Purification. Springer-Verlag. N.Y. (1982), Deutscher, Methods in Enzymology Vol. 
182: Guide >to Protein Purification., Academic Press, Inc. N.Y. (1990)). Substantially pure compositions of about 50 
to 95% homogeneity are preferred, and 80 to 95% or greater homogeneity are most preferred for use as therepeutic 
agents. 

One of skill in the art will recognize that after chemical synthesis, biological expression, or 
15 purification, the fusion molecule may possess a conformation substantially different than the native conformations 
of the constituent polypeptides. In this case, it is often necessary to denature and reduce the polypeptide and then 
to cause the polypeptide to re-fold into the preferred conformation. Methods of reducing and denaturing proteins 
and inducing re folding are well known to those of skill in the art (See. Debinski et at. J. Biol. Chem,, 268: 14065- 
14070 (1993); Kreitman and ?a%\an, Bioconjug. Chen,.. 4: 581-585 (1993); and Buchner.«/ a/ v >«/7a/. 205: 
20 263-270 (1992). 

I- Pharmaceu tical compositions comprising binding domain polypeptides 
The polypeptides of the invention are useful in therapeutic and prophylactic applications for the 
treatment of malaria. Pharmaceutical compositions of the invention are suitable for use in a variety of drug delivery 
systems. Suitable formulations for use in the present invention are found in Remington's Pharmaceutical Sciences. 
25 Mack Publishing Company, Philadelphia, PA, 17th ed. (1985). For a brief review of methods for drug delivery, see. 
Langer, Science 249:1 527-1533 (1990). 

The polypeptides of the present invention can be used in pharmaceutical and vaccine compositions 
that are useful for administration to mammals, particularly humans. The polypeptides can be administered together 
in certain circumstances, e.g. where infection by both P. falciparum zni P. vhrax is likely. Thus, a single 
30 pharmaceutical composition can be used for the treatment or prophylaxis of malaria caused by both parasites. 

The compositions are suitable for single administrations or a series of administrations. When given 
as a series, inoculations subsequent to the initial administration are given to boost the immune response and are 
typically referred to as booster inoculations. 

The pharmaceutical compositions of the invention are intended for parenteral, topical oral or local 
35 administration. Preferably, the pharmaceutical compositions are administered parenterally, e.g., intravenously, 
subcutaneousfy, intradermal^, or intramuscularly. Thus, the invention provides compositions for parenteral 
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administration that comprise a solution of the agents described above dissolved or suspended in an acceptable carrier, 
preferably an aqueous carrier. A variety of aqueous carriers may be used, e.g., water, buffered water, 0.4% saline, 
0.3% glycine, hyaluronic acid and the like. These compositions may be sterilized by conventional, wed known 
sterilization techniques, or may be sterile filtered. The resulting aqueous solutions may be packaged for use as is, 
5 or lyophilized, the lyophilized preparation being combined with a sterile solution prior to administration. The 
compositions may contain pharmaceutical^ acceptable auxiliary substances as required to approximate physiological 
conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents and the like, for 
example, sodium acetate, sodium lactate, sodium chloride, potassium chloride, calcium chloride, sorbhan monolaurate, 
triethanolamine oleate, etc. 

10 For solid compositions, conventional nontoxic solid carriers may be used which include, for example, 

pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharin, talcum, cellulose, glucose, 
sucrose, magnesium carbonate, and the like. For oral administration, a pharmaceutical^ acceptable nontoxic 
composition is formed by incorporating any of the normally employed excipients, such as those carriers previously 
listed, and generally 1 0-95% of active ingredient and more preferably at a concentration of 25%-75%. 

1 5 For aerosol administration, the polypeptides are preferably supplied in finely divided form along with 

a surfactant and propellant. The surfactant must, of course, be nontoxic, and preferably soluble in the propellant. 
Representative of such agents are the esters or partial esters of fatty acids containing from 6 to 22 carbon atoms, 
such as caproic, octanoic, lauric, palmitic, stearic, linoleic, linolenic, olesteric and oleic acids with an aliphatic 
polyhydric alcohol or its cyclic anhydride. Mixed esters, such as mixed or natural glycerides may be employed. A 

20 carrier can also be included, as desired, as with, e.g., lecithin for intranasal delivery. 

In certain embodiments patients with malaria may be treated with SABP or DABP polypeptides 
or other specific blocking agents {e.g. monoclonal antibodies) that prevent binding of Plasmodium merozoites and 
schizonts to the erythrocyte surface. 

The amount administered to the patient will vary depending upon what is being administered, the 

25 state of the patient and the manner of administration. In therapeutic applications, compositions are administered 
to a patient already suffering from malaria in an amount sufficient to inhibit spread of the parasite through 
erythrocytes and thus cure or at least partially arrest the symptoms of the disease and its complications. An amount 
adequate to accomplish this is defined as "therapeutically effective dose." Amounts effective for this use will depend 
on the severity of the disease, the particular composition, and the weight and general state of the patient. Generally, 

30 the dose will be in the range of about Img to about 5gm per day, preferably about 100 mg per day, for a 70 kg 
patient. 

Alternatively, the polypeptides of the invention can be used prophylacticaily as vaccines. The 
vaccines of the invention contain as an active ingredient an rmmunogenically effective amount of the binding domain 
polypeptide or of a recombinant virus as described herein. The immune response may include the generation of 
35 antibodies; activation of cytotoxic T lymphocytes (CTL) against cells presenting peptides derived from the peptides 
encoded by the SABP, DABP or OBL sequences of the present invention, or other mechanisms well known in the art. 
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See e.g. Paul Fundamental Immunology, Second Edition (Raven Press, New York, NY) for a description of immune 
response. Useful carriers are well known in the art, and include, for example, thyroglobulin. albumins such as human 
serum albumin, tetanus toxoid, polyamino acids such as poly(D-lysme:D-glutamic acid), influenza, hepatitis B virus core 
protein, hepatitis B virus recombinant vaccine. The vaccines can also contain a physiologically tolerable (acceptable) 
diluent such as water, phosphate buffered saline, or saline, and further typically include an adjuvant. Adjuvants such 
as incomplete Freund's adjuvant, aluminum phosphate, aluminum hydroxide, or alum are materials well known in the 
art. 

The DNA or RNA encoding the SABP or DABP binding domains and the OBL gene family motifs 
may be introduced into patients to obtain an immune response to the polypeptides which the nucleic acid encodes. 
Wolff et. aL. Science 247: 1465-1468 (1990) which is describes the use of nucleic acids to produce expression of 
the genes which the nucleic acids encode. 

Vaccine compositions containing the polypeptides, nucleic acids or viruses of the invention are 
administered to a patient to elicit a protective immune response against the polypeptide. A "protective immune 
response" is one which prevents or inhibits the spread of the parasite through erythrocytes and thus at least partially 
prevent the symptoms of the disease and its complications. An amount sufficient to accomplish this is defined as 
an "immunogenically effective dose." Amounts effective for this use will depend on the composition, the manner 
of administration, the weight and general state of health of the patient, and the judgment of the prescribing 
physician. For peptide compositions, the general range for the initial immunization (that is for therapeutic or 
prophylactic administration) is from about 100 //g to about 1 gm of peptide for a 70 kg patient, followed by 
boosting dosages of from about 100 //g to about 1 gm of the polypeptide pursuant to a boosting regimen over 
weeks to months depending upon the patient's response and condition e.g. by measuring levels of parasite in the 
patient's blood. For nucleic acids, typically 30-1 OOOug of nucleic acid is injected into a 70kg patient, more typically 
about 50-150ug of nucleic acid is injected into a 70kg patient followed by boosting doses as appropriate. 

The following examples illustrate preferred embodiments of the invention. 
EXAMPLE 1: Identification of the ammo-terminal, cysteine rich region of SABP and HARP binding 
domains for erythrocytes 

1 - Expression of the SABP binding domain polypeptide on the surface of Cos cells: 
To demonstrate that the ammo-terminal, cysteine rich region of the SABP protein is the sialic acid binding 
region, this region of the protein was expressed on the surface of mammalian Cos cells w vitro. This DNA sequence 
is from position 1 to position 1848 of the SABP DNA sequence (SEQ ID No 3). Polymerase chain reaction 
technology (PCR) was used to amplify this region of the SABP DNA directly from the cloned gene. 

Sequences corresponding to restriction endonuclease sites for Pvull or Apal were incorporated into 
the oligonucleotide sequence of the probes used in PCR amplification in order to facilitate insertion of the 
PCR-ampfified regions into the pRE4 vector (see below). The specific oligonucleotides. 
35 5'-ATCGATCAGCTGGGAAGAAATACTTCATCT-3'(SE0ID N0:17)and5'-ATCGATGGGCCCCGAAGTTrGnCATTATT-3' 
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(SEQ ID NO: 18) were synthesized. These oligonucleotides were used as primers to PCR-ampiify the region of the 
DNA sequence encoding the cysteine-rich ammo terminal region of the SABP protein. 

PGR conditions were based on the standard described in Saiki, et al, Science 239: 487-491 (1988). 
Template DNA was provided from cloned fragments of the gene encoding SABP which had been spliced and re-cloned 
as a single open-reading frame piece. 

The vector, pRE4, used for expression in Cos cells is shown in Figure 2. The vector has an SV40 origin 
of replication, an ampicillin resistance marker and the Herpes simplex virus glycoprotein D gene (HSV glyd) cloned 
downstream of the Rous sarcoma virus long terminal repeats (RSV LTR). Part of the extracellular domain of the HSV 
glyd gene was excised using the Pvull and Apal sites in HSV glyd. 

As described above, the PCR oligonucleotide primers contained the Pvull or Apal restriction sites. 
The PCR-amplified DNA fragments obtained above were digested with the restriction enzymes Pvull and Apal and 
cloned into the Pvull and Apal sites of the vector pRE4. These constructs were designed to express regions of the 
SABP protein as chimeric proteins with the signal sequence of HSV glyd at the N-terminal end and the 
transmembrane and cytoplasmic domain of HSV glyd at the C-terminal end. The signal sequence of HSV glyd targets 
these chimeric proteins to the surface of Cos cells and the transmembrane segment of HSV glyd anchors these 
chimeric proteins to the Cos cell surface. 

Mammalian Cos cells were transfected with the pRE4 constructs containing the PCR-amplified 
SABP DNA regions, by calcium phosphate precipitation according to standard techniques. 

2. Expression of the DABP binding domain polypeptide on the surface of Cos cells . 

To demonstrate that the amino-terminal, cysteine-rich region of the DABP protein is the binding 
domain, this region was expressed on the surface of Cos cells. This region of the DNA sequence from position 1-975 
was first PCR-amplified (SEQ ID No 1). 

Sequences corresponding to restriction endonuclease sites for Pvull or Apal were incorporated into 
the oligonucleotide probes used for PCR amplification in order to facilitate subsequent insertion of the amplified DNA 
into the pRE4 vector, as described above. The oligonucleotides, 5'-TCTCGTCAGCTGACGATCTCTAGTGCTATT-3'(SEQ 
ID N0:19) and 5-ACGAGTGGGCCCTGTCACAACTTCCTGAGT-3' (SEQ ID ND:20) were synthesized. These 
oligonucleotides were used as primers to amplify the region of the DABP DNA sequence encoding the cysteine-rich, 
amino-terminal region of the DABP protein directly from the cloned DABP gene, using the same conditions described 
above. 

The same pRE4 vector described above in the section on expression of SABP regions in Cos cells 
was also used as a vector for the DABP DNA regions. 

3. Binding studies with erythrocytes . 

To demonstrate their ability to bind human erythrocytes, the transfected Cos cells expressing 
binding domains from DABP and SABP were incubated with erythrocytes for two hours at 37°C in culture media 
(DMEM/10% FBS). The non-adherent erythrocytes were removed with five washes of phosphate-buffered saline and 
the bound erythrocytes were observed by light microscopy. Cos cells expressing the amino terminal cysteine-rich 
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SABP polypeptides on their surface bound untreated human erythrocytes, but did not bind neuraminidase treated 
erythrocytes, that is. erythrocytes which lack sialic acid residues on their surface. Cos cells expressing other regions 
of the SABP protein on their surface did not bind human erythrocytes. These results identified the ammo-terminal, 
cysteine-rich region of SABP as the erythrocyte binding domain and-^icated that the binding of Cos cells expressing 
these regions to human erythrocytes is specific. Furthermore, the binding of the expressed region to erythrocytes 
is identical to the binding pattern seen for the authentic SABP 175 molecule upon binding to erythrocytes. 

Similarly. Cos cells expressing the amino-termmal cysteine-rich region of DABP on their surface 
bound Ouffy-positive human erythrocytes, but did not bind Duff y-negative human erythrocytes, that is erythrocytes 
which lack the Duffy blood group antigen. Cos cells expressing other regions of the DABP protein on their surface 
did not bind human erythrocytes. These results identified the amino-terminal cysteine rich region of DABP as the 
erythrocyte binding domain and indicated that the binding of the Cos cells was specific. 
EXAMPLE 2: Isolation of polynucleotide sentiences j n the DBL family 

P.fakiparum clones and cell line used include the following, P falciparum clones 3D7. D10, LF4/1, 
Camp/A 1.SL/D6, HB3, 768, V1/S. T2/C6, KMWII, HG2F6, FCR3/A2 and Dd2 have been previously tabulated (Dolan' 
et al. (1993). Mol. Biochem. Parasitol. 61. 137-142). Line Dd2/NM1 was selected from clone Dd2 for invasion via 
a sialic acid-independent pathway (Dolan,* al. (1990), J. Clio. Invest. 86, 618-624). All parasites were maintained 
m vitro by standard methods (Trager. et al. (1976), Science 193, 673-675). 

DMA and RWA Isolation and Analysis. DNA was extracted as described (Peterson, e/ a/ (1990), 
Natl. Acad Sci. USA 87, 3018-3022). Endonuclease digestion, agarose gel electrophoresis, and filter 
hybridizations were performed by standard methods (Sambrook. et al.. 1989). All hybridizations were at 56°C 
(Sambrook. et a/., 1989). Blots were washed for 2 min. at room temperature in 2x standard saline/phosphate/EDTA 
(SSPE) with 0.5% SDS, followed by two higher stringency washes at 50°C in 0.3xSSPE with 0.5% SDS. Parasite 
chromosomes were embedded in agarose blocks and separated by pulsed field gel electrophoresis (Dolan. et al. 
(1993). Methods. Mol. Biol. 21, 319-332). RNA was isolated from cultured parasites by Lid extraction of 
Catrimox-14-precipitated RNA (Dahle, et al. (1993), BioTechnioues 15. 1 102-1105). Agarose gel electrophoresis of 
total RNA and filter hybridizations were performed by standard methods (Sambrook, et al., (1 989). 

Oligonucleotide Primers and PCR. Primers specific for E31a used in a RT-PCR to test for 
expression of this sequence were E31aT2 (5'-AGA-CCT.CAA-TTt-CTA-AG-3') (SEQ ID N0:21) and E31aRev1 
(5'-AAT-CGC-GAG-CAT-CAT-CTG-3'J (SEQ ID N0:22). 

Two primers were used to amplify additional sequences from genes encoding DBL domains. These 
were designed from conserved amino acids encoded in the DBL domain of the eba-175 and E31a sequences. After 
adaptation to incorporate the most frequently-used P. falciparum codons, forward primer UNIEBP5' 
[5'-CC(A/G)-AG(G/A)-AG(G/A)-CAA-(G/A)AA-(C/T)TA-TG.3T (SEQ ID NO:23), based upon the amino acid sequence 
PRRQKLC, and reverse primer UNIEBP3' [5'-CCA-(A/T)C(T/GHT/G)A(A/G) (A/G)AA nG-(A/T)GG-31 (SEQ ID N0:24), 
35 based upon the amino acid sequence PQFLRW, were synthesized. 
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RT-PCR amplifications were performed as described (Kawasaki, at al. (1990), PCft Protocols, A 
Guide to Methods and Applications, eds. Innis, M.A., Gelfand, D.H., Sninsku, J.J. & White, TJ. (Academic, San 
Diego), pp. 21-27). In brief, 0.5 to 1 mg of total RNA was treated with RQ1 DNAse (Promega), phenol/chloroform 
extracted, and ethanol precipitated. The RNA was then annealed with random oligonucleotide primers and extended 
5 with Superscript reverse transcriptase (GIBCO/BRL). PCR cycling conditions were 94°C for 10 sec, 45°C for 15 
sec, and 72°C for 45 sec, for 30 cycles. All PCRs were performed in an Idaho Technology air thermal cycler using 
buffer containing 2 mM Mg2+. 

PCR amplification products were separated by use of PCR Purity Plus gels and protocols (AT 
Biochem, Malvern, PA). 

10 DMA Clones and Hybridization Probes. Clone pE31a was isolated from a genomic library 

prepared from the region of chromosome 7 linked to chloroquine resistance Walker-Jonah, et al (1992), Mo/. 
Biochem. Parashol. 51, 313-320. Clone pS31H (GenBank accession no. L38454), containing an insert encompassing 
that of pE31a, was cloned from a size-selected Hind III restriction digest of Dd2 genomic DNA. 

Clone pEBLel was cloned from a RT-PCR of Dd2 cDNA after amplification with primers UNIEBP5' 

15 (SEQ ID N0:23) and UNIEBP3' (SEQ ID N0:24). Clone p EBP 1.2 (GenBank accession no. L3B450), containing an insert 
encompassing that of pEBLel, was isolated from a Dd2 cDNA library probed with pEBLel. ZWZ-encoding sequences 
of dbl~nmh4 (GenBank accession no. L38455) and dblnmhS (GenBank accession no. L38453) were amplified by 
RT-PCR from first strand cDNA of line Dd2/NM using primers UNIEBP5' and UNIEBP3'. Sequencing was performed 
on double stranded DNA templates by standard protocols for the dideoxynucleotide method. (Sequenase; U.S. 

20 Biochemicals). 

Sequences related to the E31a sequence were detected with the 3005 bp insert of clone pS31H. 

The eba-175 gene was detected with a PCR amplified probe consisting of the first 1825 bp of the coding sequence. 

ebf-1 sequences were detected with the 2098 bp insert of clone pEBP1.2. All probes were comparable in 

organization, each containing a region encoding at least one DBL domain and varying amounts of flanking sequence. 
25 Homology searches and alignments. Homology searches were performed with BLAST and the 

Genetics Computer Group program FASTA (Altschul, et al, (1990), J. MoL Biol 215, 403410; Devereux, et al 

(1984), Nucleic Acids. Res. 12(1 Pt 1, 387-395). Optimized alignments were produced with MACAW sequence 

alignment software (Schuler, et al. (1991), Proteins. 9, 180-190). 

Multiple P. falciparum sequences encode DBL domains. Positional cloning experiments directed 
30 to P. falciparum chromosome 7 identified an ORF (E31a) encoding a DBL domain that is homologous to the domains 

found in the P. max and P. knowlesi DABPs and the P. falciparum SABP. Figure 4 shows the reattive position of 

the E31a ORF on chromosome 7. 

The homology between the DBL domains of E31a and the erythrocyte-binding proteins is due to 

the presence of short motifs of highly conserved amino acids. These well-conserved stretches are separated by 
35 non-homologous sequences and by deletions and insertions that vary the size of the domain by greater than 150 aa. 

The typical DBL domain contains 12 or more cysteine residues and has 7 conserved tryptophan residues. Additional 
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well conserved amino acids include 4 arginines, 3 aspartates. 9 positions with aliphatic residues (alanine, isoleucine. 
leucine, or valine) and 4 with aromatic amino acids (tryptophan, phenylalanine, or tyrosine). 

Probes spanning the sequence that encodes the E31a DBL domain hybridized to multiple fragments 
within a single restriction digest and yielded bands that varied among parasite fines. The numerous distinct bands 
from a selection of different parasite DNAs indicated a large number of diverse but related elements. These multiple 
bands varied among different P. falciparum clones, in contrast to the well-conserved. single-copy signal obtained with 
the eba-175 probe. 

Because of the numerous cross-hybridizing sequences, it seemed likely that many of these related 
sequences would be on different chromosomes of the parasite. PFG electrophoresis of P. falciparum Ml 
chromosomes and hybridization with the E31a probe identified a number of cross-hybridizing sequences on multiple 
chromosomes. A control hybridization with the eba-175 probe under identical conditions yielded a single band of 
hybridization from chromosome 7. 

RNA Analysis of DBL Elements. Sequences from E31a (pS31H insert) were used to probe RNA 
blots for corresponding transcripts. No hybridization was detected. Because it was still possible that a message 
of low abundance was not being detected on the RNA blot, RT-PCR was used as a means of more sensitive 
detection. For this purpose. cDNA was generated by RT from random primers annealed to DNAse-treated total RNA. 
E31a-specific oligonucleotides were then used to test for amplification from the cDNA. No amplification of the E31a 
sequence was obtained, while genomic DNA controls and amplification from cDNA by dihydrofolate 
reductase/thymidylate synthetase-specific primers yielded the expected bands. A screen of a cDNA library with E31a 
specific probes also failed to detect any clones hybridizing with the ORF. These results indicate that E31a is either 
a pseudogene, or is expressed in parasite strains or stages not examined in this work. 

A PCR Method to Isolate Sequences Encoding AM Domains. The identification of short 
conserved motifs in DBL domains that otherwise have extreme diversity led to e PCR strategy using degenerate 
oligonucleotide primers designed from conserved amino acid sequences in the ZWZ domains. Sequences PRRQKLC 
and PQFLRW were judged most suitable for minimizing degeneracy while allowing amplification of expressed DBL 
sequences. After these considerations and adjustment for P. falciparum codon usage, primers DNIEBP5' and 
UNIEBP3' were synthesized. 

While some P. falciparum lines yielded similar patterns of amplified bands (e. g. Dd2 and MCamp; 
FCR3/A2 and K-1), no two separate isolates showed identical patterns, reflecting the diversity of the DBL domains 
in the parasite lines. A few bands of the same apparent size were present in many isolates. These included a 
consistent 490 bp product that was determined to be the eba-175 gene by its expected size and hybridization to 
a gene-specific probe. The number of discernible bands probably underestimates the number of amplifiable sequences 
because of overlapping products of the same size and possible preferential amplification of some sequences over 
others. Nevertheless, the parasite-specific patterns in the amplified bands may provide a means to quickly type 
35 isolates and serves as a measure of parasite diversity in field samples. 
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To identify DBL -encoding sequences in RNA transcripts, the UNIEBP primers were used to amplify 
first-strand cDNAs generated from DNAse-treated RNA preparations. Amplified products from Dd2, 3D7, HB3 and 
MCAMP cDNAs had diverse sizes ranging from 400 bp to nearly 1 kb. These included a band at 480-500 bp that 
was determined to be eba-175 from its expected size and cross-hybridization to an eba175~speti\\c probe. Other 
5 bands were from amplification of different transcripts encoding DBL domains. Dd2-NM1 RNA, for example, yielded 
bands above the eba-175 product that included two related sequences (dbhnm1-4,dbl-nm1-5\. These bands were 
found to be isolate-specific and to have features consistent with the var genes described in Example 3, below. 
Probes that detect dbl-nml-4 and dblnmhS hybridized to multiple chromosomes and aligned more closely with E31a 
than with EBA-175 or DABP. 

10 The RT-PCR amplifications also yielded a consistent band that encoded a novel DBL domain distinct 

from eba-175. A cDNA clone corresponding to this product was isolated by screening a /tgt10 Dd2 cDNA library 
with a radiolabeled ebl-1 probe. Sequence from this and additional overlapping cDNA clones confirmed the conserved 
motifs of the DBL domain. The alignment of the predicted amino acid sequences showed that the DBL domain of 
ebf-1 is more similar to eba-175 than to the multicopy genes. There was, however, extensive divergence from 

15 aba- 175 and other known genes outside of the amplified region. 

In contrast to the multicopy hybridization patterns of dbl-nml-4 and dblnmhS, the ebl-1 sequence, 
like that of eba-175, was found to have hybridization patterns consistent with a conserved single-copy gene. Probes 
specific for ebl-1 hybridized only to chromosome 13, and restriction analysis with the enzymes Cla I, EcoRl ///MM, 
Hin\ I, Nsi I, Rsa I, and Spe I, all yielded bands expected from a single copy sequence. RNA blots probed with 

20 e£A /-specific sequences showed several bands of hybridization, however, corresponding to 8-9.5 kb transcripts in 
mRNA from the Dd2 and 3D7 parasites. The transcripts of different size may result from alternative start and 
termination points or from incompletely processed species containing introns. 
EXAMPLE 3: Isolation of var genes 

Parasite clones, DNA analysis and Chromosome Mapping . Parasite clones were cultivated by the methods 

25 of (Trager, etai (1976), Science 193, 673-675). DNA was extracted from parasite cultures as described (Peterson, 
et af. (1988), Proc. Natl. Acad. ScL USA 85, 9114-9118) except that the DNA was as recoverd by ethanol 
precipitation rather than spooling. Fingerprint analysis with the pC4.H32 probe was used to confirm DNA 
preparations (Dolan, et aL (1993), MoL Biochem. Parasite!. 61, 137-142). Southern blotting to Nytran membranes 
was recommended by the manufacturer (Schleicher & Schuell, Keene, NH). PFG separation of the 14 P. falciparum 

30 chromosomes and chromosome mapping were performed as described (Wellems, et al. (1987), Cell 49, 633-642; 
Sinnis, et aL (1988); Genomics 3, 287-295). 

RNA isolation . Parasites from 200 ml mixed stage cultures (5-10% parasitemia) were released by saponin 
lysis as for DNA preparations except that the procedures were performed with ice-cold solutions. RNA was 
immediately isolated from the parasite pellet by guanidine thiocyanatelphenol-chloroform methods, recovered and 

35 treated with RNAase-free DNAse (Creedon, et al. (1994), J. BioL Chem. 269, 16364-16370. RNA in H 2 0 was 
combined with 2 vol 100% ETOH, distributed into 2 ml vials and frozen as stock at -70°C. RNA was recovered by 
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preciprtation with 0.1 vol 3M NaOAc. RNA blots were generated and probed as described (Creedon, .„/. (1994) 
J. Biol. Chem. 269, 16364-16370). 

YAC isolation, chromosome-segment libraries and cDIMA libraries . Overlapping YACs spanning the 300 kb 
segment of chromosome 7 that contains the CQR locus were obtained from a YAC library of a CQR FCR3 parasite 
hnede Bruin, etal. (1992). Genomics 14. 332-339) by the procedures of Lamer,*,/. (1993). /Ito 361. 654-657 
Onentation of the YACs and their overlaps were identified with probes obtained from the YAC ends by inverted PCR. 

Attempts to construct cosmid libraries and large insert (~ 10 kb) A fibraries from high molecular 
we,ght P. falciparum genomic DNA yielded only rearranged clones. An arternative approach was therefore taken in 
wh,ch chromosome-segment libraries were constructed that contained small (0.5-5 kb) inserts in plasmid vectors 
Plasmid libraries containing AM, HM\, Rsa\ and Sspl inserts in pCDNAII were constructed from Dd2 chromosome 
7 restr.ct.on fragments purified by pulsed-field gel (PFG) electrophoresis (Wellems. at al. (1991), Proc. Natl. Acad. 
Sci USA 88, 3382-3386). A plasmid library from a 34 kb Apa\-Sma\ restriction fragment of YAC PfYED9 was 
constructed by the same methods. Inserts in the plasmid libraries were generally 0.54 kb. 

The>4gt10 Dd2 cONA library was prepared under contract by CloneTech Laboratories Inc. (Palo 
Alto, CA) from the DNAse-treated. polyA* fraction of Dd2 RNA. The cDNA was generated in two separate reactions 
us.ng ohgodT primers or random primers. Products of these reactions were combined, processed and cloned into the 
EcoRI site of vlgtlO. 1.6 x 10 6 independent recombinants were obtained and amplified. 

Isolation of overlapping clones and ONA ^nenrinn . Plasmid clones from the chromosome-segment and 
YAC-segment libraries were picked at random and their locations were established by restriction mapping. After 
sequence data from these clones were generated, overlapping clones were isolated in a process of "chromosome 
walking" by rescreening the libraries with oligonucleotide probes near the ends of sequenced inserts. Sufficient 
Avergence was present among repetitive elements in the sequences to allow distinction of clones and unambiguous 
assignment of overlaps (generally 50-200 bp). 

Sequencing reactions with single-strand M13 DNA (1 /yg) and double-strand plasmid ONA (2-5 //g) 
were performed in 96-well polyvinyl chloride U-bottom microassay plates using a Sequenase protocol recommended 
by Unrted States Biochemical Corp. (Cleveland, OH). Reactions were separated by 8M urea-6% polyacrylamide 
sequencing gels and exposed to Kodak BioMax MR film. Sequence data from some clones were also obtained by 
use of an ABI 373A automated DNA sequencer (Applied Biosvstems Inc., Foster City, CA). Cycle sequencing 
reactions were performed using the ABI PRISM DyeDeoxy system. 

DNA sequence editing, analyses and display were performed with MacVector software (International 
Biotechnologies Inc.. New HaVen. CT). BLAST (Altschul, etal. (1990), J. Mol. Biol. 215, 403410). Genetics Computer 
Group programs (Devereux. et al. (1984). Nucleic Acids Res. 12. 387-395) and the DNADRAW package (Shapiro, et 
al. (1 986), Nucleic Acids Res. 14. 65-73) maintained at the National Institutes of Health. 

Identification of a large hypervariahlp reoion within a rhrn m osorne 7 ,pnn,.nt r m t a . - ^ 
resistance. Four overlapping yeast artificial chromosomes from the P. falciparum FCR3 line were obtained that span 
the 300 kb chromosome segment linked to CQR. a segment located 300-600 kb from the telomere of chromosome 
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7. Figure 5 shows the positions of these YACs (PfYEF2, PfYFE6, PfYKF8, PfYED9) relative to the chromosome map. 
In order to define the structure of this 300 kb segment, we performed comparative hybridizations to search for 
polymorphisms between parasite lines. Clones were randomly picked from chromosome segment-specific plasmid 
libraries and their inserts were hybridized against restriction digests of the YAC and parasite DNAS. Over thirty 
5 inserts were identified that recognized PfYEF2, PfYFE6 or PfYKF8 and showed a predonderance of single copy 
sequences with few polymorphisms {AftA, Hinft, Rsa\ and Ssp\ digests), consistent with prior findings that 
chromosome internal regions are largely conserved and contain a preponderance of single copy sequences. However, 
fifteen other inserts that recognized PfYED9 showed highly polymorphic sets of repetitive elements in the parasite 
DNAs. Southern analysis indicated that these polymorphic elements were part of a chromosome hypervariable region 

10 contained within the PfYED9 clone. 

Mapping and DNA sequencing of the hypervariable region spanned bv YAC PfYED9 . Single copy sequences 
detected by pE45b and pH270.5 flank the hypervariable region spanned by PfYED9 (Figure 5). The pE45b and 
pH270.S probes were therefore used to assign large restriction fragments on the PfYED9 map and establish enzyme 
recognition sites as reference points. A detailed restriction map of the PfYED9 hypervariable region was then 

15 developed. Fifteen overlapping clones ( a a"-"f and N h n V in Figure 5) were isolated by a chromosome walking 
approach from Dd2 chromosome subsegment libraries (Wellems et a/., supra) The inserts yielded 19.1 kb of 
continuous Dd2 sequence having predicted enzyme recognition sites in perfect accord with the PfYED9 restriction 
map. Such agreement indicates that the Dd2 and FCR3 sequences in this part of the chromosome are very similar, 
despite differences elsewhere in the genome that are evident by restriction analysis. 

20 We also obtained genomic sequence data from the 34 kb ApalSma) fragment of PfYED9. Purified 

PfYED9 DNA was cut with Sma\ to yield a 110 kb fragment, which was then isolated by PFG electrophoresis and 
digested with Apa\. The resulting 34 kb Apa\Sma\ band was purified by PFG electrophoresis, digested in four 
separate reactions by Alu\, Hinft, Rsa\ or Ssp\ and incorporated into a plasmid (PCDNAII) library. Cloned inserts from 
the library were checked for hybridization to the PfYE09 34 kb fragment, assigned to the PfYED9 map and 

25 sequenced (Figure 5). Overlapping inserts were obtained by the chromosome walking approach except for three gaps 
("t", "z", "0" in Figure 5) which were closed by PCR amplification of Pf YED9 DNA using primers from flanking 
sequences. The clones from PfYED9 (V'zYk", and "a n + n {J" in Figure 5) yielded 22.2 kb of continuous DNA 
sequence that overlaps the Dd2 sequence at the "fT/T junction and has predicted restriction sites that match the 
PfYED9 map perfectly. The composite sequence from the Dd2 and PfYED9 segments is 40,171 kb. 

30 Structure of a var oene cluster and comparative analysis of predicted amino acid seouences . The 40,171 

bp sequence contains three 10-12 kb regions that have related sequences and structure. Each of these regions 
harbors a pair of ORFS. The first ORF in each pair begins with a consensus ATG start codon preceded by typical 
P. falciparum non-coding sequence of abundant A+T content. The ORFs of each pair are separated by an intervening 
AT-rich and non-coding sequence of 0.9 kb to 1.1 kb. Presence of consensus intron-exon splice junction sequences 

35 at either end of these intervening sequences and lack of a consistent translation start site in the 3' ORF indicate 
that the each pair of ORFs belongs to an individual gene having a two exon structure. This has been verified by 
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comparison of the genomic sequences to the cDNA sequence of an expressed gene \var-7; see subsequent section). 
The three 10 kb to 12 kb regions thus contain members of a variant gene family which have coding regions of 
9.23kb (var-V, 7.99 kb (var-2) and 9.01 kb (va,3). Predicted molecular weights of the encoded proteins are 350 
kD, 302 kD and 344 kD. respectively. 

The var genes are flanked by additional members of the var family in Pf YED9. Restriction analysis 
identified two additional genes that are 12-35 kb upstream of the sequenced region and are closely related to var-2 
and var. 3 (var-2c and ^.Figure 5). The agones thus have a clustered arrangement in which many individual 
members are organized in head-to-tail fashion. Between varf and var-2 is a 5 kb DMA sequence that harbors a short 
ORF homologous to that of a repetitive element (rijl suggested to be a transposable element in P. falciparum. 

The deduced protein sequences of the var genes are highly diverse, yet aV contain certain 
conserved motifs and common structural features. Database searches identified 2 to 4 domains within each var 
sequence that are homologous to cysteine-rich domains of SABP and DABP. In the sequences, the first domain 
near the amino-terminus (DBL domain 1) is the most conserved of the DBL domains and has amino acid signatures 
that differentiate it from subsequent domains (e.g. consensus peptide sequences GAcAplY/FJrrL 
CTxLARsfadlgdlVrgrdLyLG and VPTYFDYVpqylrwF). Between DBL domains 1 and 2 is another type of conserved 
domain, a cysteine-rich interdomain region (CIDR) of 300-400 amino acids. The CIDR does not have all the motifs 
of a DBL domain, but it does have a region at the 3'end which is homologous to the end of the Fl DBL domain m 
SABP. The conservation evident in the sequences of DBL domain I and the CIDR suggest that these regions maintain 
important structures in the head of the variant molecule. 

DBL domains 2. 3 and 4 (numbering is according to var- J. the first sequence completed) have 
less discriminating signatures than domain 1, and show features of cross-alignment and variation in number that 
suggest these domains can undergo shuffling and deletion. 

DBL domain 4 is followed by a segment of variable length and a hydrophobic region that is 
encoded at the end of the first exon (exon 1). In all var sequences this hydrophobic region fits the criteria of a 
transmembrane segment, the second exon (exon II) encodes a large (45-55 kD) conserved C-terminal sequence that 
has an acid character (predicted pi - 4.5, vs. 5.9 for the part of the protein upstream of the splice junction) and 
a cysteine content of < 1% (vs. > 4% upstream). The position of this C-terminal sequence downstream of a 
single transmembrane segment suggests that it has a cytoplasmic location. 

No consensus signal sequence was detected in the NH 2 -terminal region of the predicted iw ORFs. 
We note the presence of several motifs in the protein sequences that are known to act as ligands and receptors in 
the integrin family. These include RGD (var-1 codons 886-88, 1992-94) and DGEA (var- 1 codons 211114). Not 
all of these motifs occur in each protein sequence and, when they do occur, their positions vary. 

Identification of var transcripts and chroma™ p„,rp«i„n .if. To ldentlfy transcribed ww- sequences 
we screened a ^gtlO Dd2 cDNA library with ^-containing BssHW restriction fragments that had been purified from 
PfYED9 and radiolabeled by random hexamer priming. This screening yielded 18 clones with inserts that hybridized 
back to PfYED9. By cross-hybridization studies and DNA sequence analysis the inserts fell into two groups: group 
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I inserts that aligned with sequences of var exon I WT24Q, AMM, y»T284, XX2%1, /IT288, /IT295, ^T296); 

and group II inserts that aligned with sequences of var exon II WT140, /IT141, /IT142, y(T145, >IT147, >IT148, 
vlTl50,^T152). 

The full ORF of an expressed var gene [varl) was determined from /1T242 and overlapping cDNA 
5 clones that were obtained by a PCR-based walking strategy. The sequence showed that varl has a 6.6 kb ORF 
containing two DBL domains, a hydrophobic transmembrane sequence and carboxyterminal region typical of var genes 
(predicted molecular weight 249 kD). Comparison of varl with the var- J sequence demonstrated continuity of the 
alignments at the predicted splice junction between the ORFs of exons I and II. PCR amplification of Dd2 genomic 
DNA was also performed with primers derived from the two varl exons. Sequence of this varl PCR product 

10 confirmed consensus splice sites and a 1 kb intron typical of the var genes. Transcription of var l was detected 
as a 7.5 kb band by RNA blot analysis. 

Chromosome mapping experiments with a ftw-7-speciftc probe localized the varl gene to a region 
that is 600 kb from one end of Dd2 chromosome 12 (chromosome 12 has a length of 2600 kb). No hybridization 
of the varl probe was detected to any other Dd2 chromosome nor to any chromosomes of the HB3, 3D7 or A4 

15 parasites. Other cDNA inserts from the group I clones were also sequenced and examined for chromosome 
hybridization signals. The >1T240 cDNA insert mapped to the var-Mvar'2lvar-3 cluster on Dd2 chromosome 7 and 
its sequence matched that of var 2. The /4T244, >IT284, ,41287, >1T288, ><T295 and >1T296 inserts all showed 
overlapping sequences and yielded the same hybridization patterns. Chromosome sites recognized by these inserts 
included regions within two Sma\ fragments from Dd2 chromosome 7 and another from chromosome 9. We note 

20 that loss of a cytoadherence phenotype has been correlated with a chromosome 9 deletion in certain P. falciparum 
lines. 

1.8 kb to 2.4 kb RNA transcripts related to var exon II . In addition to the 7.5 kb var l band, a broad 1.8 
kb to 2.4 kb band was detected on RNA blots after hybridization with a probe that recognizes var exon II. 
Sequences of eight group II cDNA inserts homologous to exon II were therefore determined and aligned against the 

25 var genes. Comparative analysis of the insert sequences showed that all differed from one another in regions of 
overlap, indicating that transcription of the corresponding RNAs was from different loci. Three of the cDNA 
sequences WT140, /ITI41 and /ITI48) aligned downstream of the intron/exon II splice junction. However, five other 
cDNA inserts WT142, >*T145, >tT147, /IT 150 and >1T152) had sequences that aligned upstream of the var 
intron/exon II splice site and included regions homologous to var intron sequences. In the vicinity of the splice 

30 junction, consensus splice sites occurred in three of the cDNA sequences HT142, A1W1, /IT 150) while a fourth 
sequence MT145) showed the required AG dinucleotide but not the expected pyrimidine tract of the splice consensus. 
The part of the fifth sequence WT152) that aligned with the var intron extended upstream only to the TAG of the 
splice sequence. AH five sequences lacked a consensus start codon preceded by A+T-rich non-coding DNA that is 
typical of P. falciparum translation start sites. 

35 Isolatespecific var seouences and evidence for DNA recombination in cultivated parasite clones . The 

diversity of var forms expressed by P. falciparum parasites reflects a tremendous repertoire in the var gene family. 
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This repertoire is evident in the patterns of restriction polymorphism detected by var probes as well as in the 
detection of ^-specific sequences that hybridize to some parasite DNAs but not to others. The var? gene 
expressed by Dd2, for example, is not present in the HB3, 3D7 or A4 genomes. Such var diversity suggests that 
frequent DNA rearrangements underlie the Production of antigenica^ 
5 T ° ,6St ,or DNA rea ™0™ents in parasites cultivated in vitro, we used var sequences to probe 

restricted DNAs from Dd2 lines adapted to neuraminidase-treatederythrocytes. In one rearrangement a novel 35 
kbB g A fragment is seen in NM1 DNA probed with the /4T142 (group II) insert. In another rearrangement a deletion 
of a 20 kb Ps(\ band is evident in NM8 DNA probed with a var-7 sequence. Deletion of this 20 kb band was also 
detected .n the Dd2/R8 subclone obtained before neuraminidase selection, indicating that the DNA rearrangement was 
10 not produced by selection in neuraminidase-treated erythrocytes. 

The above examples are provided to illustrate the invention and other variants of the invention 
encompassed by the claims will be readily apparent to one of ordinary skill m the art. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: The United States, As Represented by the 
Secretary, Department of Health and Human Services 

(ii) TITLE OF INVENTION: BINDING DOMAINS FROM PLASMODIUM VIVAX 
AND PLASMODIUM FALCIPARUM ERYTHROCYTE BINDING PROTEINS 

(iii) NUMBER OF SEQUENCES : 45 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Knobbe Martens Olson & Bear 

(B) STREET: 620 Newport Center Drive 16th Floor 

(C) CITY: Newport Beach 

(D) STATE: California 

(E) COUNTRY: US 

(F) ZIP: 92660 

(v) COMP UTER REA DABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

<vii) PRIOR APPLICATION DATA 

(A) APPLICATION NUMBER: US08/487826 

(B) FILING DATE: 07-JUN-1996 

(viii) ATTORNEY/AGENT INFORMATION : 

(A) NAME: Israelsen, Ned 

(B) REGISTRATION NUMBER: 29,655 

(C) REFERENCE/DOCKET NUMBER: NIH121 . 001QPC 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 235-8550 

(B) TELEFAX: (619) 235-0176 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4084 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AAGCTTTTAA AAATAGCAAC AAAATTTCGA AACATTGCCA CAAAAATTTT ATGTTTTACA 60 

TATATTTAGA TTCATACAAT TTAGGTGTAC CCTGTTTTTT OATATATGCG CTTAAATTTT 120 
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^™™ TC AT ATGTTTAG TTATATGTGT AGAACAACTT GCTGAATAAA TTAPfTarar o n 

TGCAGTAGAG GGCGATTTAT TACTTAAGTT GAATAACTAC AGATATAACA AaSS™ 
CAAGGATATA AGATGGAGTT TGGGAGATTT TGGAGATATA ATTATGGGAA rorJSS? J?™ 
AGGCATCGGA TATTCCAAAG TAGTGGAAAA TAATTTGCGC AgStSSS SS^IS^ ^on 

^^ccaa cagcgtcgta aacagtggtg gaatgaatct a^Scaa? S?SacS? lllo 
aatgatgtac tcagttaaaa aaagattaaa ggggaatttt atatggattt ?tI??^^^ 
-ES52£? A ^tagaac cgcagatata taS?5gI?? S£g3S££ ga53S™ i|2o 

CGTGTCAGAA TTGCCCACAG AAGTGCAAAA ACTGAAAGAA AAATPTrarp ZS?^ 
™355* T ^^^TAT GTAAGGTAGC ACCATGTCAA A^gSSotS S^ISS 'fgS 
ESSSS*™ ACCAGAAAAA AAAATCAATG GGATGTTCTG TGAAA?aAA? JSSSSSJ l?« 
AAAA £ AGGCA GAAAAGGTTC AGACGGCAGG TATCGTAACT CCTTATGATA TActJSS ie2S 
GGAGTTAGAT GAATTTAACG AGGTGGGTTT TGAGAATGAA ATTAACAAAC GtStgttS? ^522 
•' rISI*I^ 'F ATOpSm GTTCCGTTGA AGAGGCTAAA AAAAaSSc SSSct JSIS 
GACAATGCTG CTAAATCTCA GGCCACCAAT TCAAATCCGA TAActSgc? llln 
■SSSSI™ AGTAAAGCGG AGAAGGTTCC AGGAGATTCT ACGCATGGAA ATOnSSS 2o!o 
I^ CGAAGA T AGTTCTACCA CAGGTAAAGC TGTTACGGGG GATGGTCAAA ATGGAAATCA 2100 
GACACCTGCA GAAAGCGATG TACAGCGAAG TGATATTGCC GAAAGTGTAA rTrn^a a* 
TGTTGATCCG CAGAAATCTG TAAGTAAAAG AAGTGACGAC AC^S^SJS JSSSgU? IsfS 

SESSS SSSSSSS ?™i Si 

SBSEg ss sssssssssss si SI 

■SgjSSS JSS^S 233SS dlSSlll 

J? AGGA TAGG GATAAAACTG TAGGAAGTAA AGATGGAGGG GGGGAAGATA ACTC^SS 26« 
^ AAGGATGCA GCGACTGTAG TTGGTGAGGA TAGAATTCGT GAGAACAG^CG CTCCTGGTAG 2700 
CACTAATGAT AGATCAAAAA ATGACACGGA AAAGAACGGG GCCTCTACCC CTOACtoi? 
JS^^GAG GATGCAACTG CGCTAAGTAA AACCGAAAGT tSSSJSS CAG^SgtSg* 282? 
SSSS*^ ACTAATGATA CAACTAACAG TTTAGAAAAT AAAAATGGAG GAAAAGAAAA lllo 
^™ ACAA AAGCATGATT TTAAAAGTAA TGATACGCCG AATGAAGAAC CAAATTCTGA llll 
rir&»spp^ GATGCAGAAG GACATGACAG GGATAGCATC AAAAATGATA AAGCAGAAAG 3 2 00 
GAGAAAGCAT ATGAATAAAG ATACTTTTAC GAAAAATACA AATAGTCACC ATTTAAATAG loll 
^^TAAT TTGAGTAATG GAAAATTAGA TATAAAAGAA TACAAATACA GaStctS^ llll 
GAAGATATTA TATTAATGTC TTCAGTACGC AAGTGCAACA ATAATAtSS 3^80 
ESS??™ TGTAAC TCTG TAGAGGACAA AATATCATCG AATACTTGTT CTAGAGAGAA Mil 
AAGTAAAAAT TTATGTTGCT CAATATOGGA TTTTTGTTTG AACTATTTTG ACGTGTATTr »S 
TTATGAGTAT CTTAGCTGCA TGAAAAAGGA ATTTGAAGAT CCATCCTaS AgSSS£ Wits 
SESSSSJ T™**^™ TGGAGAAAAA GATGCTGAAT SSSSSSg? SSSSSSS 3420 
TTAAAAAGGA ATTAATTTTA GGAATGTTAT AAACATTTTT GTACCCAAAA ^OTTTTTGr llfln 
■■{SSfiSS T ACTTTGCOG CGGCGGGAGC GTTGCTGATA SSSSiJS* Isll 
AAGGAAGATG ATC AAAAATG AGTAACCAGA AAATAAAATA AAATAACATA AAATAAAATA 3600 
AAAACTAGAA TAACAATTAA AATAAAATAA AATGAGAAAT GCCTGTTAAT gI^CACTt'S 366^ 
^™S GAT TCCATTTGTG AAGTTTTAAA GAGAGCACAA ATGCATAGTC ATTAtS™ 3720 
TGCATATATA CACATATATG TACGTATATA TAATAAACGC ACACTTTCTT SScGTAcS lllo 
TTCTGAAGAA GCTACATTTA ATGAGTTTGA AGAATACTGT GATAATATTC ACAGAATCCC 384? 
TCTGATGCCT AACAGTAATT CAAATTTCAA GAGGAAAATT CCATTTAAAA AGAAATGTTA 3900 
CATCATTTTG CGTTTTTCTT TTTTTCTTTT TTTTTTCTTT TTTAGATATT GA^SSc 3960 
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AGCCATCAAC CCCCCTGGAT TATTCATGAT GCTACTTTGG TAAGTAAAAG CAATTCTGAT 4020 
TGTAGTGCTG ATGTAATTTT AGTCATTTTG CTTGCTGCAA TAAACGAGAA AATATATCAA 4080 
GCTT 4064 

5 (2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1115 amino acids 

(B) TYPE: amino acid 

10 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

15 (iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium vivax 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Lys Gly Lys Asn Arg Ser Leu Phe Val Leu Leu Val Leu Leu Leu 
1 5 10 15 

Leu His Lys Val Ser Tyr Lys Asp Asp Phe Ser lie Thr Leu lie Asn 
25 20 25 30 

Tyr His Glu Gly Lys Lys Tyr Leu lie lie Leu Lys Arg Lys Leu Glu 

35 40 45 

Lys Ala Asn Asn Arg Asp Val Cys Asn Phe Phe Leu His Phe Ser Gin 
50 55 60 

30 Val Asn Asn Val Leu Leu Glu Arg Thr lie Glu Thr Leu Leu Glu -Cys 

65 70 75 80 

Lys Asn Glu Tyr Val Lys Gly Glu Asn Gly Tyr Lys Leu Ala Lys Gly 

85 90 95 

His His Cys Val Glu Glu Asp Asn Leu Glu Arg Trp Leu Gin <3ly Thr 
35 100 105 110 

Asn Glu Arg Arg Ser Glu Glu Asn lie Lys Tyr Lys Tyr Gly Val Thr 

115 120 125 

Glu Leu Lys lie Lys Tyr Ala Gin Met Asn Gly Lys Arg Ser Ser Arg 
130 135 140 

40 He Leu Lys Glu Ser lie Tyr Gly Ala His Asn Phe Gly Gly. Asn Ser 

145 150 155 160 

Tyr Met Glu Gly Lys Asp Gly Gly Asp Lys Thr Gly Glu <31u Lys Asp 

165 170 175 

Gly Glu His Lys Thr Asp Ser Lys Thr Asp Asn Gly Lys Gly Ala Asn 
45 180 185 190 

Asn Leu Val Met Leu Asp Tyr Glu Thr Ser Ser Asn Gly Gin Pro Ala 

195 200 205 

Gly Thr Leu Asp Asn Val Leu Glu Phe Val Thr Gly His Glu <31y Asn 
210 215 220 

50 Ser Arg Lys Asn Ser Ser Asn Gly Gly Asn Pro Tyr Asp lie Asp His 

225 230 235 240 

Lys Lys Thr lie Ser Ser Ala lie lie Asn His Ala Phe Leu Gin Asn 

245 250 255 

Thr Val Met Lys Asn Cys Asn Tyr Lys Arg Lys Arg Arg Glu Arg Asp 
55 260 265 270 

Trp Asp Cys Asn Thr Lys Lys Asp Val Cys He Pro Asp Arg Arg Tyr 

275 280 285 

Gin Leu Cys Met Lys Glu Leu Thr Asn Leu Val Asn Asn Thr Asp Thr 
290 295 300 

60 Asn Phe His Arg Asp He Thr Phe Arg Lys Leu Tyr Leu Lys Arg Lys 

305 310 315 320 

Leu He Tyr Asp Ala Ala Val Glu Gly Asp Leu Leu Leu Lys Leu Asn 

325 330 335 

Asn Tyr Arg Tyr Asn Lys Asp Phe Cys Lys Asp He Arg Trp Ser Leu 
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Gly Asp Phe Gly°A8 P He He Met Ol^Thr Asp Met Glu Civile Gly 

360 365 
Tyr Ser Lys Val Val Glu As^Asn Leu Arg Ser lie Phe Gly Thr Asp 

Glu Lys Ala Gin Gin 'Arg Arg Lys Gin Trp Trp AsnGlu Ser Lys Ala 

Gin He Trp Thr Ala Met Met Tyr Ser Val Lys Lys Arg Leu Lys Glv 

405 410 A-i C 

Asn Phe He Trp He Cys Lys Leu Asn Val Ala Val Asn He Glu Pro 
420 425 43 0 

al S< 

Leu Pro Thr Glu Val Gin Lys^eu^ys Glu Lys Cys Asp* Gly Lys lie 

455 450 



Gin He Tyr Arg T rp He Arg Glu Trp Gly Arg Asp Tyr Val Ser Glu 

. _ b 440 445 



Asn Tyr Thr Asp Lys Lys Val Cys Lys Val Pro Pro Cys Gin Asn Ala 
Cys Lys Ser Tyr Asp G ln Trp He Thr Arg Lys Lys Asn Gin Trp Asp 



Val Leu Ser Asn Lys* Phe He Ser Val Lys^Asn Ala Glu Lys Va^Gln 



500 505 510 

lu L< 

Glu Phe Asn Glu Val Ala Phe gIu Asn Glu He Asn Lys Irg Asp Gly 

535 540 



Thr Ala Gly He Val Thr Pro Tyr Asp He Leu Lys Gin Glu Leu Asp 
515 520 525 * 



Ala Tyr He Glu Leu Cys Val Cys Ser Val Glu Glu Ala Lys Lys Asn 

Thr Gin Glu Val Val Thr Asn Val Asp Asn "a Ala Lys Ser Gin III 
565 *~ ~ ~ 



550 555 56 0 

Ala 

Thr Asn Ser Asn Prolle Ser Gin Pro Val Asp Ser Ser Lys Ala ? Glu 

Lys Val Pro Gly Asp Ser Thr His GlyAsn Val Asn Ser Gly 9 Gin Asp 

„ 595 600 60S 

Ser Ser Thr Thr Gly Lys AlaVal Thr Gly Asp Gly Gin Asn Gly Asn 

Gin Thr Pro Ala Glu Ser Asp Val Gin Arg Ser As" He Ala Glu Ser 

Val ser Ala Lys Asn_ Val Asp Pro Gin Lys Ser Val Ser Lys Arg Ser 

Asp Asp Thr A^la Ser Val Thr Gly He Alf Glu Ala Gly Lys Glu^sn 

Leu Gly Ala Ser Asn Ser Arg Pro Ser Glu Ser Thr Val Glu ? Ala Asn 

con ° ly ASP ASP ThT Val Asn Ser Ala Se^ lie Pro* Val Val Ser 

695 700 
Gly Glu Asn Pro Leu Val Thr Pro Tyr Asn Gly Leu Arg His Ser Lys 

715 ft 

Asp Asn Ser Asp Ser_Asp Gly Pro Ala Glu Ser Met Ala Asn Pro Asp 

Ser Asn Ser Lys Gly G lu Thr Gly Lys GlyGto Asp Asn Asp Me^Ala 

740 745 750 

Lys Ala Thr Lys Asp Ser Ser Asn Ser Ser Asp Gly Thr Ser Ser Ala 

755 760 755 

Thr ?S ASP Thr Thr ASP Ala Val AsD Glu Ile Asn Lys Gly Val 

770 775 780 

Pro Glu Asp Arg Asp Lys Thr Val Gly Ser Lys Asp Gly Gly Gly Glu 

795 Ron 
Asp Asn Ser Ala Asn Lys Asp Ala Ala Thr Val Val Gly Glu Asp Ara 

805 810 815 

He Arg Glu Asn Ser Ala Gly Gly Ser Thr Asn Asp Arg Ser Lys Asn 

825 830 
Asp Thr Glu Lys Asn Gly Ala Ser Thr Pro Asp Ser Lys Gin Ser Glu 

835 840 845 

Asp Ala Thr Ala Leu Ser Lys Thr Glu Ser Leu Glu Ser Thr Glu Ser 
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850 855 860 

Gly Asp Arg Thr Thr Asn Asp Thr Thr Asn Ser Leu Glu Asn Lys Asn 
865 870 875 880 

Gly Gly Lys Glu Lys Asp Leu Gin Lys His Asp Phe Lys Ser Asn Asp 

885 890 895 

Thr Pro Asn Glu Glu Pro Asn Ser Asp Gin Thr Thr Asp Ala Glu Gly 

900 905 910 

His Asp Arg Asp Ser lie Lys Asn Asp Lys Ala Glu Arg Arg Lys His 

915 920 925 

Met Asn Lys Asp Thr Phe Thr Lys Asn Thr Asn Ser His His Leu Asn 

930 935 940 

Ser Asn Asn Asn Leu Ser Asn Gly Lys Leu Asp lie Lys Glu Tyr Lys 
945 950 955 960 

Tyr Arg Asp Val Lys Ala Thr Arg Glu Asp lie lie Leu Met Ser Ser 

965 970 975 

Val Arg Lys Cys Asn Asn Asn lie Ser Leu Glu Tyr Cys Asn Ser Val 

980 985 990 

Glu Asp Lys lie Ser Ser Asn Thr Cys Ser Arg Glu Lys Ser Lys Asn 

995 1000 1005 

Leu Cys Cys Ser lie Ser Asp Phe Cys Leu Asn Tyr Phe Asp Val Tyr 

1010 1015 1020 

Ser Tyr Glu Tyr Leu Ser Cys Met Lys Lys Glu Phe Glu Asp Pro Ser 
1025 1030 1035 1040 

Tyr Lys Cys Phe Thr Lys, Gly Gly Phe Lys lie Asp Lys Thr Tyr Phe 

1045 1050 1055 

Ala Ala Ala Gly Ala Leu Leu lie Leu Leu Leu lie Ala Ser Arg Lys 

1060 1065 1070 

Met lie Lys Asn Asp Ser Glu Glu Ala Thr Phe Asn Glu Phe Glu Glu 

1075 1080 1085 

Tyr Cys Asp Asn lie His Arg lie Pro Leu Met Pro Asn Asn lie Glu 

1090 1095 1100 

His Met Gin Pro Ser Thr Pro Leu Asp Tyr Ser 
1105 1110 1115 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4507 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: ' 



TATATATATA TATATATATA GATAATAACA TATAAATATA TTCAATGTGC ATACAATGAA 60 
ATGTAATATT AGTATATATT TTTTTGCTTC CTTCTTTGTG TTATATTTTG CAAAAGCTAG 120 
GAATGAATAT GATATAAAAG AGAATGAAAA ATTTTTAGAC GTGTATAAAG AAAAATTTAA 180 
TGAATTAGAT AAAAAGAAAT ATGGAAATGT TCAAAAAACT GATAAGAAAA TATTTACTTT 240 
TATAGAAAAT AAATTAGATA TTTTAAATAA TTCAAAATTT AATAAAAGAT GGAAGAGTTA 300 
TGGAACTCCA GATAATATAG ATAAAAATAT GTCTTTAATA AATAAACATA ATAATGAAGA 360 
AATGTTTAAC AACAATTATC AATCATTTTT ATCGACAAGT TCATTAATAA AGCAAAATAA 420 
ATATGTTCCT ATTAACGCTG TACGTGTGTC TAGGATATTA AGTTTCCTGG ATTCTAGAAT 480 
TAATAATGGA AGAAATACTT CATCTAATAA CGAAGTTTTA AGTAATTGTA GGGAAAAAAG 540 
GAAAGGAATG AAATGGGATT GTAAAAAGAA AAATGATAGA AGCAACTATG TATGTATTCC 60 0 
TGATCGTAGA ATCCAATTAT GCATTGTTAA TCTTAGCATT ATTAAAACAT ATACAAAAGA 660 
GACCATGAAG GATCATTTCA TTGAAGCCTC TAAAAAAGAA TCTCAACTTT TGCTTAAAAA 720 
AAATGATAAC AAATATAATT CTAAATTTTG TAATGATTTG AAGAATAGTT TTTTAGATTA 7 80 
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JS^ KrCTr GCTA TGGGAA ATGATATGGA TTTTGGAGGT TATTCAACTA AGGCAr aa a a a * n 
CAAAATTCAA GAAGTTTTTA AAGGGGCTCA TGGGGAAATA AGTGAACATA AAATT a aaa» Itl 
5522^* GAATGGTGGA ATGAATTTAG AGAGAAACTT TOoSmSoS ^StcJgA III 
K ?S^^^ T AATATAAATA ATTGTAAAAA TATTCCCCAA CSaTgA^AC A^CTACT^ 1 of S 
AA&s&PTiaa GAATGGCATG GAGAATTTTT GCTTGAAAGA GATAATAGa5 oSSSm ^080 
AAAAAGTAAA TGTAAAAATA ATACATTATA TGAAGCATGT GAGAAGGAAT GTATTGATrS 
ATGTATGAAA TATAGAGATT GGATTATTAG AAGTAAATTT GAATGGCATA SSK» 
AraaliPaar JS?**™ TTCCAAAG GA AAATGCgSS S^SSS SaAA^TO u£S 
Ifl tf^S^S AATGATGCTA AAGTAAGTTT ATTATTGAAT AATTGTGATG CTGAATaSS if IS 

SXSSIS gattgtaaac atactactac tctcgttaaa agcgttttaa atgctSSS Win 
S^S^ 1 * aaggaaaagc gtgaacatat tgatttagat gatttttcta aatttg^S* \lll 
l^l^^l tccgttgata caaacacaaa ggtgtgggaa tgtaaaaacc cS?S?? \tnl 

TariiT»T»p GATGTATGTG TACCTCCGAG GAGGCAAGAA TTATGTCTTG SSStJgI 1^60 
15 S^**™— 0 ^TAAAAACC TATTAATGAT AAAAGAG CAT ATTCTTGCTA TTGCAATATA Ull 
1 TGAATCAAGA ATATTGAAAC GAAAATATAA GAATAAAGAT GATAAAGAAG TraSXiS i2? 

^AAATAAA ACTTTCGCTG ATATAAGAGA TATTATAGGA GGTAcJSJ? ATOGA^ JSIS 
TTTGAGCAAT AGAAAATTAG TAGGAAAAAT TAACACAAAT TCAAAATATG TTrSSSJ^ rl™ 
?S^ T <aTMGOTT TTCGTGATGA GTGGTGGAAA gSa^AaII SgaSStS llfiO 

20 25JSSS^ I CA IS GGTAT tcaaggataa aactgtttgt aaagaagatg atattgaaaa JUS 

a^a^RnwvT^ TTCTTCAGAT GGTTTAGTGA ATGGGGTGAT GATTATTGCC AGGATAAAAC lffifl 
AAAAATGATA GAGACTCTGA AGGTTGAATG CAAAGAAAAA CCTTGTGA ap ?tn%z£!£z 
IcAArrraii TGTAATTCAT ATAAAGAATG GATATCAAAA AAAAAAGAAG A^TATAATAA 2100 
ACAAGCCAAA CAATACCAAG AATATCAAAA AGGAAATAAT TACAAAATGT ATTrTraa^ o^n 
TAAATCTATA AAACCAGAAG TTTATTTAAA GAAATACTCG GAAAAATgS 

JESSES- GAATTTAAGG AAGAATTACA tSSKSS SaAA^aIS SfoS 3Sb 
ISa^S™ AAGGATGTAC CAATTTCTAT AATAAGAAAT AATGAACAAA CTTCgSXgA 2340 
J^TTCCT GAGGAAAACA CTGAAATAGC ACACAGAACG GAAACTCCAT CTATCTCTGA 2400 
GGAAATGAAC AAAAAGAACG TGATGACGAT AGTTTGAGTA AAATAAGTGT 2460 
,„ ^^CAGAA AATTCAAGAC CTGAAACTGA TGCTAAAGAT ACTTCTAACT TG^AAAaS Itll 
•SiiSSSSSS flpATATTA GTATGCCTAA AGCAGTTATT GGGAGCAGTC CTAATGMAA 2580 
^^^^ ACTGAACAAG GGGATAATAT TTCGGGGGTG AATTCTAAAC CTTTATCTPA ?«!n 
TGATCTACGT CCAGATAAAA AGGAATTAGA AGATGAAAAT AGTgSgAAT SgAaSSaS Inal 
■S3SS!JSf ' ^ATATCAA AAAGTCCATC TATAAATAAT GGAGAtSS? SSSSSiSS 2^60 
35 tSS^* GTGAGTGAAT CTAGTAGTTC AAATACTGGA TTGTCTATTG ATGA^S lain 
a^S™ ACATTTGTTC GAACACAAGA TACAGCAAAT ACTGAAGATG TTATTAGAAA llll 
AGAAAATGCT GACAAGGATG AAGATGAAAA AGGCGCAGAT GAAGAAAGAC ATACTACTt? llln 
■ISJS2SU A AGTTCACCTG AAGAAAAAAT GTTAACTGAT AATGAAGGAG GAAA^AGTTT 3000 
AAATCATGAA GAGGTGAAAG AACATACTAG TAATTCTGAT AATGTTGAAC AGtSSaS ™2n 
40 JSJSIS™ ATGAATGTTG AGAAAGAACT AAAAGAtSct ?SS5SS§ SSS?aS 3?20 
'"SISS^ GGA AAAGCAC ATGAAGAATT ATCAGAACCA AATCTAAGCA GTgSSSSS llll 
l^llV^ 1 ACACCTGGAC CTTTGGATAA CACCAGTGAA GAAACTACAG AAAGAATTAG 3240 
JAATAATGAA TATAAAGTTA ACGAGAGGGA AGATGAGAGA ACGCTTACTA AGGA^TATGA 3300 
JSSKTSn TTGAAAAGTC ATATGAATAG AGAATCAGAC GATGGTGAAT TATATGaSa* 33sS 
45 tJi^^. TTATCTACTG TAAATGATGA ATCAGAAGAC GCTGAAGCAA AAATGAAAGG 342^ 
45 AAATGATACA TCTGAAATGT CGCATAATAG TAGTCAACAT ATTGAGAGTG ATC^ATArAA ?lnn 
AAACGATATG AAAACTGTTG GTGATTTGGG AACCACACAT GTACAAAACG A^ATTAGTCT 354^ 
l^^ KCA GGAGAAATTG ATGAAAAATT AAGGGAAAGT AAAGAATCAA AAATX^SS 3^ n 
^IJAAGAG GAAAGATTAA GTCATACAGA TATACATAAA ATTAATCCTG aSSIJSJaI 3^60 
50 JS*™™ ^ACATTTAA AAGATATAAG AAATGAGGAA AACGAAAGAC ACTTAACTAA 3^20 
AATATTAGTC AAGAAAGGGA TTTGCAAAAA CATGGATTCC ATACCATCAA 3 7 SO 
TAATCTACAT GGAGATGGAG TTTCCGAAAG AAGTCAAATT AATCATAGTC ATcSJSSaI 3M0 
CA^CAAGAT CGGGGGGGAA ATTCTGGGAA TGTTTTAAAT ATGAGATCTA ATAATAATAA 3900 
TTTTAATAAT ATTCCAAGTA GATATAATTT ATATGATAAA AAATTAGATT TAGaJc™ 3^0 
55 1^^^ AATGATAGTA CAACAAAAGA ATTAATAAAG AAATTAGCAG AaStAAaS 402? 
aa?!^^ GAAATTTCTG TAAAATATTG TGACCATATG ATTCATGAAG AAATCCCATT toll 
AAAAACATGC ACTAAAGAAA AAACAAGAAA TCTGTGTTGT GCAGTATCAG ATTACTGTAT till 
GAGCTATTTT ACATATGATT CAGAGGAATA TTATAATTGT ACGAAAAGGG AATTTgISJ l^n 
TCCATCTTAT ACATGTTTCA GAAAGGAGGC TTTTTCAAGT ATGATATtS AaJSaS till 
60 Jf^^ A^ATATTATT ATTTTTATAC TTACAAAACT GCAaSgtS tl 20 
AATTAATTTC TCATTAATTT TTTTTTTCTT TTTTTCTTTT TAGGTATGCC ATATTATGCA till 
^CAGGTG TGTTATTTAT TATATTGGTT ATTTTAGGTG CTTCACAAGC SSS till 
1££?C%T^ AAATAAATAA AAATAAAATT GAGAAGAATG TAAATTAAAT ATAGAATTCG 4500 

4507 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1435 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

<iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Met 


Lys 


Cys 


Asn 


He 


Ser 


He 


Tyr 


Phe 


Phe 


Ala 


Ser 


Phe 


Phe 


Val 


Leu 


1 








5 










10 










15 




Tyr 


Phe 


Ala 


Lys 


Ala 


Arg 


Asn 


Glu 


Tyr 


Asp 


He 


Lys 


Glu 


Asn 


Glu 


Lys 








20 










25 










30 






Phe 


Leu 


Asp 


Val 


Tyr 


Lys 


Glu 


Lys 


Phe 


Asn 


Glu 


Leu 


Asp 


Lys 


Lys 


Lys 






35 










40 










45 








Tyr 


Gly 


Asn 


Val 


Gin 


Lys 


Thr 


Asp 


Lys 


Lys 


He 


Phe 


Thr 


Phe 


He 


Glu 




50 










55 










60 










Asn 


Lys 


Leu 


Asp 


He 


Leu 


Asn 


Asn 


Ser 


Lys 


Phe 


Asn 


Lys 


Arg 


Trp 


Lys 


65 










70 










75 








80 


Ser 


Tyr 


Gly 


Thr 


Pro 


Asp 


Asn 


He 


Asp 


Lys 


Asn 


Met 


Ser 


Leu 


lie 


Asn 










85 










90 










95 




Lys 


His 


Asn 


Asn 


Glu 


Glu 


Met 


Phe 


Asn 


Asn 


Asn 


Tyr 


Gin 


Ser 


Phe 


Leu 








100 










105 








110 






Ser 


Thr 


Ser 


Ser 


Leu 


He 


Lys 


Gin 


Asn 


Lys 


Tyr 


Val 


Pro 


He 


Asn 


Ala 






115 










120 










125 








Val 


Arg 


Val 


Ser 


Arg 


He 


Leu 


Ser 


Phe 


Leu 


Asp 


Ser 


Arg 


lie 


Asn 


Asn 




130 










135 








140 








Gly 


Arg 


Asn 


Thr 


Ser 


Ser 


Asn 


Asn 


Glu 


Val 


Leu 


Ser 


Asn 


Cys 


Arg 


Glu 


145 










150 










155 






160 


Lys 


Arg 


Lys 


Gly 


Met 


Lys 


Trp 


Asp 


Cys 


Lys 


Lys 


Lys 


Asn 


Asp 


Arg 


Ser 










165 










170 










175 




Asn 


Tyr 


vai 


Cys 


Tl- 
116 


Pro 


ASp 


Arg 


Arg 


lie 


Gin 


Leu 


Cys 


He 


Val 


Asn 








180 










185 










190 






Leu 


Ser 


He 


He 


Lys 


Thr 


Tyr 


Thr 


Lys 


Glu 


Thr 


Met 


Lys 


Asp 


His 


Phe 






195 










200 










205 








He 


Glu 


Ala 


Ser 


Lys 


Lys 


Glu 


Ser 


Gin 


Leu 


Leu 


Leu 


Lys 


Lys 


Asn 


Asp 




210 










215 










220 








Asn 


Lys 


Tyr 


Asn 


Ser 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu 


Lys 


Asn 


Ser 


Phe 


Leu 


225 










230 










235 










240 


Asp 


Tyr 


Gly 


His 


Leu 


Ala 


Met 


Gly 


Asn 


Asp 


Met 


Asp 


Phe 


Gly 


Gly 


Tyr 










245 










250 










255 




Ser 


Thr 


Lys 


Ala 


Glu 


Asn 


Lys 


He 


Gin 


Glu 


Val 


•Phe 


Lys 


Gly 


Ala 


His 








260 










265 










270 






Gly 


Glu 


He 


Ser 


Glu 


His 


Lys 


He 


Lys 


Asn 


Phe 


Arg 


Lys 


-Glu 


Trp 


Trp 






275 










280 










285 








Asn 


Glu 


Phe 


Arg 


Glu 


Lys 


Leu 


Trp 


Glu 


Ala 


Met 


Leu 


Ser 


Glu 


His 


Lys 




290 










295 










300 








Asn 


Asn 


He 


Asn 


Asn 


Cys 


Lys 


Asn 


He 


Pro 


Gin 


<51u 


Glu 


Leu 


Gin 


He 


305 










310 










315 










320 


Thr 


Gin 


Trp 


He 


Lys 


Glu 


Trp 


His 


Gly 


Glu 


Phe 


Leu 


Leu 


Glu 


Arg 


Asp 










325 










330 










335 




Asn 


Arg 


Ser 


Lys 


Leu 


Pro 


Lys 


Ser 


Lys 


Cys 


Lys 


Asn 


Asn 


Thr 


Leu 


Tyr 








340 










345 










350 






Glu 


Ala 


Cys 


Glu 


Lys 


Glu 


Cys 


He 


Asp 


Pro 


Cys 


Met 


Lys 


Tyr 


Arg 


Asp 






355 










360 










365 








Trp 


He 


He 


Arg 


Ser 


Lys 


Phe 


Glu 


Trp 


His 


Thr 


Leu 


Ser 


Lys 


<3lu 


Tyr 
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370 










375 








380 










Glu 


Thr 


Gin 


Lys 


Val 


Pro 


Lys 


Glu 


Asn 


Ala Glu 


Asn 






Tl*» 

X X*w 


T.ve 
uy o 


385 










390 








395 








400 


lie 


Ser* 


Glu 


Asn 


Lvs 


Asn 


ASD 


Ala 


Lys 


Val Ser 


Leu 


Leu 


Leu 


As zi 


Asn 










405 










410 








415 




Cys 


Asp 


Ala 


Glu 


Tvr 


Sfer 


Lvs 


Tvr 
* 


Cys Asp Cys 


Lys 


His 


Thr 


Thr 


Thr 








420 










425 






430 






Leu 


Val 


Lys 


Ser 


Val 


Leu 


Asn 


Glv 


Asn Asp Asn 


Thr 


lie 


- Lys 


Glu 


Lys 






435 










440 








445 




Arcr 


Glu 


His 


He 


Asp 


Leu 


Asp 


Asp 


Phe 


Ser Lys 




wx y 


CjrD 


Act*) 
nop 


T.ve 

uy o 




450 










455 








460 




Asn 


Ser 


Val 


Asp 


Thr 


Asn 


Thr 


Lys 


Val 


Trp Glu 


Cys 


xj y o 


noli 


f x u 


iyr 


465 










470 








475 








480 


lie 


Leu 


Ser 


Thr 


Lys 


Asp 


Val 


Cys 


Val 


Pro Pro 




Ary 


Gl n 

V3XU 


Gl ti 

V3X IX 


T.on 
XJCIX 










485 










490 






495 




Cys 


Leu 


Gly Asn 


He 


Asp 


Arg 


He 


Tyr Asp Lys 


A nn 


XJC U 


XiC Li 




Tip 

lie 


















505 








510 






Lys 


Glu 


HIS 


lie 


Leu 


Ala 


He 


Ala 


lie 


Tyr Glu 


OCX 


Arg 


X 1c 


Leu 


- 

x^ys 
















520 








525 






Arcr 

AXV^ 


Lys 


Tyr 


Lys 


Asn 


Lys 


Asp 


Asp 


Lys 


Glu Val 


Cys 


. xj y o 


Tl© 

X xc 


Tie 

X 1C 


noli 




530 










535 








540 










Lys 


Thr 


Pne 


Ala 


Asp 


He 


Arcr 


Asp 


lie 


He Gly 


Glv 


XXIX 


Act*) 
no 


iyr 


irp 


545 










550 








555 








560 


Asn 


Asp 


Leu 


Ser 


Asn 


Arcr 


Lys 


Leu 


Val 


Gly Lys 


lie 


Asn 


X XIX 


A en 
/loll 












565 










570 








575 




Lys 


Tvr 

■* 


vax 


HIS 


Arcr 


Asn 


Lys 


Lys 


Asn Asp Lys 


Leu 


Phe 


A.T*cr 


Act* 


Glu 

w X IX 








con 










585 








590 




Trr> 

XT' 


Tro 


Lys 


val 


lie 


Lys 


Lys 


Asp 


Val 


Trp Asn 


Val 


lie 


Ser 


xxp 


Val 






C Q C 


.• - 








600 








605 








Phe 


Lys 


Asp 


Lys 


Thr 


Val 




Lys 


Glu Asp Asp 


Tie 


Glu 

w X IX 


A en 


TT *a 
IXC 


Drri 
rj. \j 




610 










615 








620 










Gin 


Phe 


Pne 


Arc J 


Trr> 


Phe 


Ser 


Glu 


Trp 


Gly Asp 




iyt 


PVR . 

uys 


Gl n 

V3X11 


A en 
nop 


625 










630 








635 










640 


Lys 


Thr 


Lys 


Met 


He 


Glu 


Thr 


Leu 


Lys 


Val Glu 


w jr o 


xiy o 


Gl ii 

wX U 


xjy o 


Dm 










645 










650 








655 




Cys 


Glu 


- 

Asp 


■ 

Asp 


Asn 


Cys 


Lys 


Ser 


Lys 


Cys Asn 




xyi 


Ujr O 


Gl ii 

VJX LI 


Trri 

x rp 








DO U 










665 








670 






Tie 

X X C 




Lys 


Lys 


j_i y o 


V7XU. 


Gl n 


xyr 


Asn 


Lys Gin 


Al a 


xuy fc» 


Gl n 


_ 

xyr 


Gl n 






675 








680 








roc 

D O J 






■ - 


Glu 


Tvr 

xyir 


Gin 


Lys 


Gl v 


A en 




j.yr 


Lys 


Met Tyr 


Co T* 
OCX 


GT ii 


rile 


Lys 






690 










con 

D y D 








f \j \J 










IXC 


XJ y o 


Pro 


Glu 


Val 
vox 


lyr 


T .OH 
XJG U 


T Arcr 


Lys 


Tyr Ser 


pi ..m 

\J JL Li 


Lys 


Lyb 


Ser 


Asn 


705 










710 








715 










720 


Leu 


Asn 


Phe 


Glu 


Asp 


Glu 


Phe 


Lys 


Glu 


Glu Leu 


XIX O 


Cay* 

OCX 


Aciti 


A Y 1 


XJ jf o 










725 










730 








73 5 




Asn 


Lys 


Cys 


Thr 


Met 


Cys 


Pro 


Glu 


Val 


Lys Asp 


Val 


Pro 


He 


Ser 


X X c 








740 










745 








750 






lie 


Arg 


Asn 


Asn 


Glu 


Gin 


Thr 


Ser 


Gin 


Glu Ala 


Val 


Pro 


Glu 


Glu 


Asn 






755 










760 








765 








Thr 


Glu 


He 


Ala 


His 


Arcr 


Thr 


Glu 


Thr 


Pro Ser 


lie 


Ser 


Glu 


Glv 


Pro 




770 








775 








780 








Lys 


Glv 

wo. y 


Asn 


Glu 


Gin 


Lvs 


Glu 


Arcr 


Asp Asp Asp 


Ser 


Leu 


Ser 


Lvs 


He 


785 










790 








795 










800 


Ser 


Val 


Ser 


Pro 


Glu 


Asn 


Ser 


Arg 


Pro 


Glu Thr 


Asp 


Ala 


Lys 


Asp 


Thr 










805 










810 








815 




Ser 


Asn 


Leu 


Leu 


Lys 


Leu 


Lys 


Gly 


Asp 


Val Asp 


He 


Ser 


Met 


Pro 


Lys 








820 










825 








830 






Ala 


Val 


He 


Gly 


Ser 


Ser 


Pro 


Asn 


Asp Asn He 


Asn 


Val 


Thr 


Glu 


Gin 






835 








840 








845 








Gly 


Asp 


Asn 


He 


Ser 


Gly 


Val 


Asn 


Ser 


Lys Pro 


Leu 


Ser 


Asp 


Asp 


Val 




850 










855 








860 










Arg 


Pro 


Asp 


Lys 


Lys 


Glu 


Leu 


Glu 


Asp 


Gin Asn 


Ser 


Asp 


Glu 


Ser 


Glu 


865 










870 








875 










880 


Glu 


Thr 


Val 


Val 


Asn 


His 


lie 


Ser 


Lys 


Ser Pro 


Ser 


He 


Asn 


Asn 


Gly 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 









885. 










890 










895 




Asp 


Asp 


Ser Gly 
900 


Ser 


Gly 


Ser 


Ala 


Thr 
905 


Val 


Ser 


Glu 


Ser 


Ser 
910 


Ser 


Ser 


Asn 


Thr 


Gly Leu 
915 


Ser 


He 


Asp 


Asp 
920 


Asp 


Arg 


Asn 


Gly 


Asp 
925 


Thr 


Phe 


Val 


Arg 


Thr 


Gin Asp 


Thr 


Ala 


Asn 


Thr 


Glu 


Asp 


Val 


He 


Arg 


Lys 


Glu 


Asn 




930 








935 










940 






Ala 


Asp 


Lys Asp 


Glu 


Asp 


Glu 


Lys 


Gly 


Ala 


Asp 


Glu 


Glu 


Arg 


His 


Ser 


945 








950 










955 








960 


Thr 


Ser 


Glu Ser 


Leu 
965 


Ser 


Ser 


Pro 


Glu 


Glu 
970 


Lys 


Met 


Leu 


Thr 


Asp 
975 


Asn 


Glu 


Gly 


Gly Asn 


Ser 


Leu 


Asn 


His 


Glu 


Glu 


Val 


Lys 


Glu 


His 


Thr 


Ser 






980 










985 








990 






Asn 


Ser 


Asp Asn 


Val 


Gin 


Gin 


Ser 


Gly 


Gly 


He 


Val 


Asn 


Met 


Asn 


Val 






995 






1000 








1005 








Glu 


Lys 


Glu . Leu 


Lys 


Asp 


Thr 


Leu 


Glu 


Asn 


Pro 


Ser 


Ser 


Ser 


Leu 


Asp 


1010 






1015 








1020 








Glu 


Gly 


Lys Ala 


His 


Glu 


Glu 


Leu 


Ser 


Glu 


Pro 


Asn 


Leu 


Ser 


Ser 


Asp 


1025 






1030 








1035 








1040 


Gin 


Asp 


Met Ser 


Asn 


Thr 


Pro 


Gly 


Pro 


Leu 


Asp 


Asn 


Thr 


Ser 


Glu 


Glu 






1045 








1050 






1055 




Thr 


Thr 


Glu Arg 


lie 


Ser 


Asn 


Asn 


Glu 


*ryr 


Lys 


Val 


Asn 


Giu Arg 


Glu 






1060 








1065 








1070 






Asp 


Glu 


Arg Thr 


Leu 


Thr 


Lys 


Glu 


Tyr 


Glu 


Asp 


He 


Val 


Leu 


Lys 


Ser 




1075 






1080 








1085 






His 


Met 


Asn Arg 


Glu 


Ser 


Asp 


Asp 


Gly Glu 


Leu 


Tyr 


Asp 


Glu 


Asn 


Ser 


1090 






1095. 








1100 








Asp 


Leu 


Ser Thr 


Val 


Asn 


Asp 


Glu 


Ser 


Glu 


Asp 


Ala 


Glu 


Ala 


Lys 


Met 


1105 






1110 








1115 






1120 


Lys 


Gly Asn Asp 


Thr 


Ser 


Glu 


Met 


Ser 


His 


Asn 


Ser 


Ser 


Gin 


His 


He 






1125 








1130 








1135 




Glu 


Ser 


Asp Gin 


Gin 


Lys 


Asn Asp 


Met 


Lys 


Thr 


Val 


Gly Asp 


Leu 


Gly 






1140 








1145 








1150 






Thr 


Thr 


His Val 


Gin 


Asn 


Glu 


He 


Ser 


Val 


Pro 


Val 


Thr 


Gly Glu 


lie 




1155 






1160 








1165 








Asp 


Glu 


Lys Leu 


Arg 


Glu 


Ser 


Lys 


Glu 


Ser 


Lys 


He 


His 


Lys 


Ala 


Glu 


1170 






1175 








1180 








Glu 


Glu 


Arg Leu 


Ser 


His 


Thr 


Asp 


He 


His 


Lys 


He 


Asn 


Pro 


Glu 


Asp 


1185 






1190 








1195 








1200 


Arg 


Asn 


Ser Asn 


Thr 


Leu 


His 


Leu 


Lys 


Asp 


He 


Arg 


Asn 


Glu 


Glu 


Asn 






1205 








1210 








1215 




Glu 


Arg 


His Leu 


Thr 


Asn 


Gin 


Asn 


He 


Asn 


lie 


Ser 


Gin 


Glu 


Arg 


Asp 






1220 








1225 








1230 


Leu 


Gin 


Lys His 


Gly 


Phe 


His 


Thr 


Met 


Asn 


Asn 


Leu 


His 


Gly Asp 


Gly 




1235 






1240 








1245 






Val 


Ser 


Glu Arg 


Ser 


Gin 


He 


Asn 


His 


Ser 


His 


His 


Gly Asn 


Arg 


Gin 


1250 






1255 








1260 








Asp 


Arg 


Gly Gly Asn 


Ser 


Gly Asn Val 


Leu 


Asn 


Met 


Arg 


Ser 


Asn 


Asn 


1265 






1270 








1275 








1280 


Asn 


Asn 


Phe Asn 


Asn 


He 


Pro 


Ser 


Arg 


Tyr 


Asn' 


Leu 


Tyr 


Asp 


Lys 


Lys 






1285 








1290 








L295 


Leu 


Asp 


Leu Asp 


Leu 


Tyr 


Glu 


Asn 


Arg 


Asn 


Asp 


Ser 


Thr 


Thr 


Lys 


Glu 






1300 








1305 








1310 






Leu 


He 


Lys Lys 


Leu 


Ala 


Glu 


He 


Asn 


Lys 


Cys 


Glu 


Asn 


Glu 


He 


Ser 




1315 






1320 








1325 








Val 


Lys 


Tyr Cys 


Asp 


His 


Met 


He 


His 


Glu 


Glu 


He 


Pro 


Leu 


Lys 


Thr 


1330 






1335 








1340 










Cys 


Thr 


Lys Glu 


Lys 


Thr 


Arg 


Asn 


Leu 


Cys 


Cys 


Ala 


Val 


Ser 


Asp 


Tyr 


1345 






1350 








1355 








1360 


Cys 


Met 


Ser Tyr 


Phe 


Thr 


Tyr 


Asp 


Ser 


Glu 


Glu 


Tyr 


Tyr 


Asn 


Cys 


Thr 






1365 








1370 








1375 




Lys 


Arg 


Glu Phe 


Asp 


Asp 


Pro 


Ser 


Tyr 


Thr 


Cys 


Phe 


Arg 


Lys 


Glu 


Ala 






1380 








1385 








1390 






Phe 


Ser 


Ser Met 


He 


Phe 


Lys 


Phe 


Leu 


He 


Thr 


Asn 


Lys 


lie 


Tyr 


Tyr 
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Tyr Phe Tyr Thr Tyr Lys Thr Ala Lys Val Thr Ile^ys Lys He Asn 

1415 1420 

?^ Ser Leu Ile Phe phe Phe Phe Phe Ser Phe 
1425 1430 1435 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2288 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE : 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

^^™ C TTCCGGCTCG TATGTTGTGT GGAATTGTGA GCGGATAACA ATTTCAPAr A «n 
G ^^T ATGACCATGA TTACGCCAAG CTCTAATACG ACT^CTMAGgSaSotG^JS 
GTACGCCTGC AGGTCCGGTC CGGAATTCAA TAAAATATTT CCAGAAAGGA ATrTrrJSIS ion 
JS^ TCC AATATATTCA AGGAATATAA AGAAAATAAT GTAGATATCA t1St?Sa15 lln 
^ AA I TAT GAATATAATA ATTTCTGTAA AGAAAAACCT GAATTAGTAT CTGCTGCCAA ?iri 

?J A I^™ AAAGCTCCAAATGCTAAATC ccctagaata tacaaatctISgSt^ lio 

SS^IS™ 1 GTGTTTGGTT GCAAAACGAA AATCAGTAAA GTTAAAAAAA jStGgSSS? 
^ A ? AG ™ T AATAAAGTAA CTAAACCTGA AGGTGTATGT GGACCACCAA GAAGGCAA^A 
SEKSS GGATATATAT TTTTGATTCG CGACGGTAAC G AGGAAGGAT TAAAAGATCA 
^TAATAAG GCAGCTAATT ATGAGGCAAT GCATTTAAAA GAGAAATATG AGAATGCTG^ 
SSS^*** ATTTGC AATG CTATATTGGG AAGTTATGCA GATATTGGAG A^TgSS 
A ™F GGAT GTTTGGAGGG ATATAAATAC TAATAAATTA TCAGAAAAAT TCOkSSJi ? 
JI^ATGGGT GGTGGTAATT CTAGGAAAAA acaaaacgat aataatgaa? GTA^TAAATG 
GTGGGAAAAA CAAAGGAATT TAATATGGTC TAGTATGGTA AAACACATTC SilSSil? 
AACATGTAAA CGTCATAATA ATTTTG AGAA AATTCCT C AA T^SSS St^A^SSa* 
iKS? 2Ef? GAATTTTG TG AGGAAATGGG TACGGAAGTC SoSS^SSSSSKS 960 
TGAAAATAAA AATTGTTCGG AAAAAAAATG TAAAAATGCA rar»mm •wn^ff^J^, 
GATAAAGGAA CGAAAAAATG AATATAATTT GCAATcSS aSt^SS SJSJiJX?- 
™MA AAAAACAATC TTTATAATAA ATTTGAGGAT T^AAAG^T? aSSSgJS r?!n 

tgaatcaaaa cagtgctcaa atatagaatt taatgatgaa acatttaca? TTCOTAAra? 
iS2J^ gcttgtat gg tatgtgaaaa tcctSSSS tSaUgS? SSSS Ilea 
AAAAA ^S AA I gtgtttccta tagaggaatc aaaaaaatct gagtta?Sa gttta£32a Ilia 

1™^™ AATACTCCTA ATAGTTCTGG TGGGGGAAAT TATGGAGATA ScAAaSSc" lllo 
-iSSJiSS? ™ GTTCATC ATGATGGTCC TAAGGAAGTG AAATCCGGAG AAAAaSIS £ J? 
ACCAAAAATA GATGCAGCTG TTAAAACAGA AAATGAATTT ACCTCTAATC GAAarririi 1 c 
l G ^^ G GAAAAAAGTA AAGGTGATCA TTCTTCTCCT GtSSt^A SSraSSS ll» 
AAA ^ AGGAA CCACAAAGGG TGGTGTCTGA AAATTTACCT AAAATTGAAG AGAAAATGGA £112 

^cttctgat tctataccaa ttactcatat agaagctgaa aagggtcagt cttctaa??? xllo 

^^ TAAT GATCCTGCAG TAGTAAGTGG TAGAGAATCT AAAGATGTAA ATCTTCATaS 
TTCTGAAAGG ATTAAAGAAA ATGAAGAAGG TGTGATTAAA ACAGATGATA GTTCaSSS 1800 
^TJG^TT TCTAAAATAC CATCTGACCA AAATAATCAT AGTGATTTAT CACAGAATGC Ilea 
AAATGAGGAC TCTAATCAAG GGAATAAGGA AACAATAAAT CCTCCTTCTA CAGAAaISK llto 
^ CAAA GAA ATTCATTATA AAACATCTGA TTCTGATGAT CATGGTTCTA A^A^aIS llto 
JGAAATTGAA CCAAAGGAGT TAACGGAGGA ATCACCTCTT ACTGATAAAA AAACTGAAAP onln 
TGCAGCGATT GGTGATAAAA ATCATGAATC AGTAAAAAGC GCtStATTT ?Sc5S 2?JS 
GATTCATAAT TCTGATAATA GAGATAGAAT TGTTTCTGAA AGTGTAGTTC AGGATTCTTC 2160 

aggaagctct atgagtactg aatctatacg tactgataac aaggatttta aaacaagtga 222S 

GGATATTGCA CCTTCTATTA ATGGTCGGAA TTCCCGGGTC GACGAgSS SSSSSS 

2288 



420 
480 
540 
600 
660 
720 
780 
840 
900 
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[2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 749 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Ala Asp Asn Asn Phe Thr Gin Glu Thr Ala Met Thr Met lie Thr Pro 
1 5 10 15 

Ser Ser Asn Thr Thr His Tyr Arg Glu Ser Trp Tyr Ala Cys Arg Ser 

20 25 30 

Gly Pro Glu Phe Asn Lys lie Phe Pro Glu Arg Asn Vai Gin lie His 

35 40 45 

lie Ser Asn lie Phe Lys Glu Tyr Lys Glu Asn Asn Val Asp lie lie 

50 55 60 

Phe Gly Thr Leu Asn Tyr Glu Tyr Asn Asn Phe Cys Lys <31u Lys Pro 
65 70 75 80 

Glu Leu Val Ser Ala Ala Lys Tyr Asn Leu Lys Ala Pro Asn Ala Lys 

85 90 95 

Ser Pro Arg lie Tyr Lys Ser Lys Glu His Glu Glu Ser Ser Val Phe 

100 105 110 

Gly Cys Lys Thr Lys lie Ser Lys Val Lys Lys Lys Trp Asn Cys Tyr 

115 120 125 

Ser Asn Asn Lys Val Thr Lys Pro Glu Gly Val Cys Gly Pro Pro Arg 

130 135 140 

Arg Gin Gin Leu Cys Leu Gly Tyr lie Phe Leu lie Arg Asp Gly Asn 
145 150 155 160 

Glu Glu Gly Leu Lys Asp His lie Asn Lys Ala Ala Asn Tyr Glu Ala 

165 170 175 

Met His Leu Lys Glu Lys Tyr Glu Asn Ala Gly Gly Asp Lys lie Cys 

180 185 190 

Asn Ala lie Leu Gly Ser Tyr Ala Asp lie Gly Asp lie Val Arg Gly 

195 200 205 

Leu Asp Val Trp Arg Asp lie Asn Thr Asn Lys Leu Ser Glu Lys Phe 

210 215 220 

Gin Lys lie Phe Met Gly Gly Gly Asn Ser Arg Lys Lys Gin Asn Asp 
225 230 235 240 

Asn Asn Glu Arg Asn Lys Trp Trp Glu Lys Gin Arg Asn Leu lie Trp 

245 250 255 

Ser Ser Met Val Lys His lie Pro Lys Gly Lys Thr Cys Lys Arg His 

260 265 270 

Asn Ash Phe Glu Lys lie Pro Gin Phe Leu Arg Trp Leu Lys -Glu Trp 

275 280 285 

Gly Asp Glu Phe Cys Glu Glu Met Gly Thr Glu Val Lys Gin Leu Glu 

290 295 300 

Lys lie Cys Glu Asn Lys Asn Cys Ser Glu Lys Lys Cys Lys Asn Ala 
305 310 315 320 

Cys Ser Ser Tyr Glu Lys Trp lie Lys Glu Arg Lys Asn Glu Tyr Asn 

325 330 335 

Leu Gin Ser Lys Lys Phe Asp Ser Asp Lys Lys Leu Asn Lys Lys Asn 

340 345 350 

Asn Leu Tyr Asn Lys Phe Glu Asp Ser Lys Ala Tyr Leu Arg Ser Glu 

355 360 365 

Ser Lys Gin Cys Ser Asn lie Glu Phe Asn Asp Glu Thr Phe Thr Phe 
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Pro As ? n Lys Tyr Lys Glu Alleys Met Val Cys Glu^sn Pro Ser Ser 
Ser Lys Ala Leu Lys Pro He Lys Thr Asn III Phe Pro He Glu g!S 



Ser Lys Lys Ser Glu Leu Ser Ser Leu Thr Asp Lys Ser Lys Asn Thr 
Ser 
Val 

Lys Glu Val Pro Lys lie As^Ala Ala Val Lys Th^Glu Asn Glu Phe 



Pro Asn Ser Ser Gly Gly Gly Asn TyrGly Asp Arg Gin Ile^er Lys 



Arg Asp Asp Val His His Asp Cly^ro Lys Glu Val Lys^er Gly Glu 



455 



Thr Ser Asn Arg Asn Asp He Glu Gly Lys gIu Lys Ser Lys Gly Asp 



485 

His Ser Ser Pro Val His Ser Lys Asp lie Lys Asn Glu Glu Pro Gin 
Ser 

■ le 

ier 

Lsn 

Gly Val lie Lyjr Thr Asp Asp Ser Ser Lys' Ser He Glu He Ser "Lys 

- 585 590 



505 



Arg Val Val Ser Glu Asn Leu Pro Lys" lie Glu Glu Lys Met Glu Ser 



Ser Asp Ser He Pro lie Thr^le Glu Ala Glu Lys^ly Gin Ser 



Ser Asn Ser Ser Asp Asn Asp^Pro Ala Val Val Se^Gly Arg Glu Ser 

550 555 

L 
I 

Si 
At 

Glu Lys Asn Leu Lys Glu He" His Tyr Lys Thr Ser^sp Ser Asp Asp 



Lys Asp Val Asn Leu His Thr Ser Glu Arg He Lys Glu Asn Glu llu 



565 570 575 



He Pro Ser Asp Gin Asn Asn His^Ser Asp Leu Ser Gin Asn Ala Asn 
Glu Asp Ser Asn Gin Gly As^Lys Glu Thr lie Asn Pro "ro Ser Thr 



His Gly ser Lys 11^ Lys Ser Glu He Glu Pro Lys Glu Leu Thr ctu 

Glu ser Pro Leu Thr Asp Lys Lys Thr o" Ser Ala Ala He GlyAsp 

His Glu Ser Val Lys Ser Ala Asp He Phe Gin Ser 6 Gl 

5 680 figs 

Ser Asp Asn Ar 9 As P Ar 9 Ile Val Ser Glu Sr- *- ' -- 
90 695 700 

Asp Ser Ser Gly Ser Ser Met Ser Thr Glu Ser He Arg Thr Asp Asn 

715 n*>t\ 
Lys Asp Phe Lys Thr^Ser Glu Asp lie Ala Pro Ser lie Asn Gly Arg 

Asn Ser Arg Val Asp Glu Leu Thr Ser Arg Arg Pro Leu 



73 0 

740 * — 745 

(2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 606 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
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AGCTCTATTA CGACTCACTA TAGGGAAAGC TGGTACGCCT GCAGGTACCG GTCCGGAATT 60 
CCCGGGTCGA CGAGCTCACT AGTCGGCGGC CGCTCTAGAG GATCCAAGCT TAATAGTGTT 120 
TATACGTCTA TTGGCTTATT TTTAAATAGC TTAAAAAGCG GACCATGTAA AAAGGATAAT 180 
GATAATGCAG AGGATAATAT AGATTTTGGT GATGAAGGTA AAACATTTAA AGAGGCAGAT 240 
5 AATTGTAAAC CATGTTCTCA ATTTACTGTT GATTGTAAAA ATTGTAATGG TGGTGATACA 300 
AAAGGGAAGT GCAATGGCAG CAATGGCAAA AAGAATGGAA ATGATTATAT TACTGCAAGT 360 
GATATTGAAA ATGGAGGGAA TTCTATTGGA AATATAGATA TGGTTGTTAG TGATAAGGAT 420 
GCAAATGGAT TTAATGGTTT AGACGCTTGT GGAAGTGCAA ATATCTTTAA AGGTATTAGA 480 
AAAGAACAAT GGAAATGTGC TAAAGTATGT GGTTTAGATG TATGTGGTCT TAAAAATGGT 540 
1 0 AATGGTAGTA TAGATAAAGA TCAAAAACAA ATTATAATTA TTAGAGCATT GCTTAAACGT 600 
TGGGTAGAAT ATTTTTTAGA AGATTATAAT AAAATTAATG CCAAAATTTC ACATTGTACG 660 
AAAAAGGATA ATGAATCCAC ATGTACAAAT GATTGTCCAA ATAAATGTAC ATGTGTAGAA 720 
GAGTGGATAA ATCAGAAAAG GACAGAATGG AAAAATATAA AAAAACATTA CAAAACACAA 780 
AATGAAAATG GTGACAATAA CATGAAATCT TTGGTTACAG ATATTTTGGG TGCCTTGCAA 840 
1 5 CCCCAAAGTG ATGTTAACAA AGCTATAAAA CCTTGTAGTG GTTTAACTGC GTTCGAGAGT 900 
TTTTGTGGTC TTAATGGCGC TGATAACTCA GAAAAAAAAG AAGGTGAAGA TTACGATCTT 960 
GTTCTATGTA TGCTTAAAAA TCTTGAAAAA CAAATTCAGG AGTGCAAAAA GAAACATGGC 1020 
GAAACTAGTG TCGAAAATGG TGGCAAATCA TGTACCCCCC TTGACAACAC CACCCTTGAG 1080 
GAGGAACCCA TAGAAGAGGA AAACCAAGTG GAAGCGCCGA ACATTTGTCC AAAACAAACA 1140 
20 GTGGAAGATA AAAAAAAAGA GGAAGAAGAA GAAACTTGTA CACCGGCATC ACCAGTACCA 1200 
GAAAAACCGG TACCTCATGT GGCACGTTGG CGAACATTTA CACCACCTGA GGTATTCAAG 1260 
ATATGGAGGG GAAGGAGAAA TAAAACTACG TGCGAAATAG TGGCAGAAAT GCTTAAAGAT 1320 
AAGAATGGAA GGACTACAGT AGGTGAATGT TATAGAAAAG AAACTTATTC TGAATGGACG 1380 
TGTGATGAAA GTAAGA TTAA AATGGGACAG CATGGAGCAT GTATTCCTCC AAGAAGACAA 1440 
25 AAATTATGTT TACATTATTT AGAAAAAATA ATGACAAATA CAAATGAATT GAAAT ACGCA 1500 
TTTATTAAAT GTGCTGCAGC AGAAACTTTT TTGTTATGGC AAAACTACAA AAAAGATAAG 1560 
AATGGTAATG CAGAAGATCT CGATGAAAAA TTAAAAGGTG GTATTATCCC CGAAGATTTT 1620 
AAACGGCAAA TGTTCTATAC GTTTGCAGAT TATAGAGATA TATGTTTGGG T ACGGAT ATA 1680 
TCATCAAAAA AAGATACAAG TAAAGGTGTA GGTAAAGTAA AATGCAATAT TGATGATGTT 1740 
30 TTTTATAAAA TTAGCAATAG TATTCGTTAC CGTAAAAGTT GGTGGGAAAC AAATGGTCCA 1800 
GTTATATGGG AAGGAATGTT ATGCGCTTTA AGTTATGATA CGAGCCTAAA TAATGTTAAT 1860 
CCGGAAACTC ACAAAAAACT TACCGAAGGC AATAACAACT TTGAGAAAGT CATATTTGGT 1920 
AGTGATAGTA GCACTACTTT GTCCAAATTT TCTGAAAGAC CTCAATTTCT AAGATGGTTG 1980 
ACTGAATGGG GAGAAAATTT CTGCAAAGAA CAAAAAAAGG AGTATAAGGT GTTGTTGGCA 2040 
35 AAATGTAAGG ATTGTGATGT TGATGGTGAT GGTAAATGTA ATGGAAAATG TGTTGCGTGC 2100 
AAAGATCAAT GTAAACAATA TCATAGTTGG ATTGGAATAT GGATAGATAA TTATAAAAAA 2160 
CAAAAAGGAA GATATACTGA GGTTAAAAAA ATACCTCTGT ATAAAGAAGA TAAAGACGTG 2220 
AAAAACTCAG ATGATGCTCG CGATTATTTA AAAACACAAT TACAAAATAT GAAATGTGTA 2280 
AATGGAACTA CTGATGAAAA TTGTGAGTAT AAGTGTATGC ATAAAACCTC ATCCACAAAT 2340 
40 AGTGATATGC CCGAATCGTT GGACGAAAAG CCGGAAAAGG TCAAAGACAA <3TGTAATTGT 2400 
GTACCTAATG AATGCAATGC ATTGAGTGTA AGTGGTAGCG GTTTTCCTGA TGGTCAAGCT 2460 
TACGTACGCG TGCATGCGAC GTCATAGCTC TTCTATAGTG TCACCTAAAT TCAATTCACT 2520 
GGCCGTCGTT TTACAACGTC GTGACTGGGA AAACCTGGCG TTACCCAACT TAATCGCCTT 2580 
GCAGCACATC CCCCTTTCGC CAGCTG 2606 



45 



55 



60 



(2) INFORMATION FOR SEQ ID NO: 8 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 921 amino acids 
50 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Lys Leu Asn Ser Val Tyr Thr Ser ILe Gly Leu Phe Leu Asn Ser Leu 
15 10 15 
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Lys Ser Gly Pro Cys Lys Lys Asp Asn Asp Asn Ala Glu Asp Asn tie 
Asp Phe Gly Asp Glu Gly Lys Thr Phe Lys Glu Ala Asp Asn °Cys Lys 
Pro Cys ser Gin Phe Thr Val As^ Cys Lys Asn Cys Asn^ly Gly Asp 
Thr Lys Gly Lys Cys Asn Gly Ser Asn Gly Lys Lys °Asn Gly Asn Asp 
Tyr lie Thr Ala Ser Asp He Glu Ash Gly Gly Asn Ser He Gly Asn 

90 QC 

He Asp Met Valval Ser Asp Lys Asp Ala Asn Gly Phe Asn Gly Leu 

Asp Ala Cys Gly Ser Ala Ash lie Phe^Lys Gly He Arg Lys^Glu Gin 

120 -toe 
Trp L^ Cys Ala Lys Val Cys ^ly Leu Asp Val Cys Gly Leu Lys Asn 

Gly Asn Gly Ser He Asp Lys Asp Gin Lys Gin H^He He He Arg 



Ala Leu Leu Lys Arg ^ Val Glu Tyr Phe III Glu Asp Tyr Asn ly° s 



165 



170 



He Asn Ala Lys He Ser His Cys Thr Lys Lys Asp Asn Glu Ser Thr 

Cys Thr Asn Asp Cys Pro Asn Lys Cy" Thr Cys Val Glu Glu Trp He 

x ^ ^ 200 205 

Asn Gin Lys Arg Thr Glu Trp Lys Asn He Lys Lys His Tyr Lys Thr 

£t J. J . — . 



Gin Asn Glu Asn Gly Asp Asn Asn Met Lys Ser Leu Val Thr Asp Tie 



230 



235 



Leu Gly Ala Leu Gin Pro Gin Ser Asp Val Asn Lys Ala He Lys Pro 



245 . 250 255 

y Ale 

Asp Asn Ser Glu Lys Lys Glu Gly Gl'u Asp Tyr Asp Leu Val Leu Cys 

280 285 



Cys Ser Gly Leu Thr Ala Phe Glu Ser Phe Cys Gly Leu Asn Gly Ala 

- 265 270 



Met Leu Lys Asn Leu Glu Lys ^In He Gin Glu Cys Lys Lys Lys His 

Gly Glu Thr Ser Val Glu Asn Gly Gly Lys Ser Cys^hr Pro Leu Asp 

310 3 15 * 



Asn Thr Thr Leu Glu gIu Glu Pro lie Glu gIu Glu Asn Gin Val Glu 
Ala Pro Asn He Cys Pro Lys Gin Thr Va" Glu Asp Lys Lys Lys^Glu 
Glu Glu Glu Glu Thr Cys Thr Pro Ala^Ser Pro Val Pro Glu lys Pro 



^i 5 , .„ _. „ 3 . 30 335 

S Gl 

s Pr 

Val Pro His Val Ala Arg Trp^rg Thr Phe Thr Pro Pro'llu Val Phe 

Lys He Trp Arg Gly Arg Arg Asn Lys Thr Thr Cy^Glu Ile Val Ala 

390 395 
Glu Met Leu Lys Asp_Lys Asn Gly Arg Thr Thr Val Gly Glu Cys Tyr 

Arg Lys Glu Thr Tyr Ser Glu Trp Thr Cy^Asp Glu Ser Lys Ile^Lys 

Met Gly Gin His Gly Ala Cys lie Pro" Pro Arg Arg Gin Lys^Leu Cys 

_ . 4 Jb ' 440 445 

Leu His Tyr Leu Glu Lys Ile^Met Thr Asn Thr Asn Glu Leu Lys Tyr 

Ala Phe lie Lys Cys Ala Ala Ala Glu Thr Phe Leu^eu Trp Gin Asn 

470 475 4fln 

Tyr Lys Lys Asp Lys Asn Gly Asn Ala Glu Asp Leu Asp Glu Lys Leu 
485 490 ' 495 



Lys Gly Gly lie Ile Pro Glu Asp Phe Lys Arg Gin Met Phe Tyr Thr 

, i oUU 505 510 

Phe Ala Asp Tyr Arg Asp lie Cys Leu Gly Thr Asp He Ser Ser Lys 
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15 



25 



Lys Asp Thr Ser Lys Gly Val Gly Lys Val Lys Cys Asn lie Asp Asp 

530 535 540 

Val Phe Tyr Lys lie Ser Asn Ser lie Arg Tyr Arg Lys Ser Trp Trp 
545 550 555 560 

5 Glu Thr Asn Gly Pro Val lie Trp Glu Gly Met Leu Cys Ala Leu Ser 

565 570 575 

Tyr Asp Thr Ser Leu Asn Asn Val Asn Pro Glu Thr His Lys Lys Leu 

580 585 590 

Thr Glu Gly Asn Asn Asn Phe Glu Lys Val lie Phe Gly Ser Asp Ser 
10 595 600 605 

Ser Thr Thr Leu Ser Lys Phe Ser Glu Arg Pro Gin Phe Leu Arg Trp 

610 615 620 

Leu Thr Glu Trp Gly Glu Asn Phe Cys Lys Glu Gin Lys Lys <31u Tyr 
625 630 635 640 

Lys Val Leu Leu Ala Lys Cys Lys Asp Cys Asp Val Asp Gly Asp Gly 

645 650 655 

Lys Cys Asn Gly Lys Cys Val Ala Cys Lys Asp Gin Cys Lys Gin Tyr 

660 665 670 

His Ser Trp lie Gly lie Trp lie Asp Asn Tyr Lys Lys Gin Lys Gly 
20 675 680 685 

Arg Tyr Thr Glu Val Lys Lys lie Pro Leu Tyr Lys Glu Asp Lys Asp 

690 695 700 

Val Lys Asn Ser Asp Asp Ala Arg Asp Tyr Leu Lys Thr Gin Leu Gin 
705 710 715 720 

Asn Met Lys Cys Val Asn Gly Thr Thr Asp Glu Asn Cys Glu Tyr Lys 

725 730 735 

Cys Met His Lys Thr Ser Ser Thr Asn Ser Asp Met Pro Glu Ser Leu 

740 745 750 

Asp Glu Lys Pro Glu Lys Val Lys Asp Lys Cys Asn Cys Val Pro Asn 
30 7 5 5 7 6 0 7 6 5 

Glu Cys Asn Ala Leu Ser Val Ser Gly Ser Gly Phe Pro Asp Gly Gin 

770 775 780 

Ala Phe Gly Gly Gly Val Leu Glu Gly Thr Cys Lys Gly Leu Gly Glu 
785 790 795 800 

Pro Lys Lys Lys lie Glu Pro Pro Gin Tyr Asp Pro Thr Asn Asp lie 

805 810 815 

Leu Lys Ser Thr lie Pro Val Thr lie Val Leu Ala Leu Gly Ser lie 

820 825 830 

Ala Phe Leu Phe Met Lys Val lie Tyr lie Tyr Val Trp Tyr lie Tyr 
40 835 840 845 

Met Leu Cys Val Gly Ala Leu Asp Thr Tyr lie Cys Gly Cys lie Cys 

850 855 860 

lie Cys lie Phe lie Cys Val Ser Val Tyr Val Cys Val Tyr Val Tyr 
865 870 875 880 

45 Val Phe Leu Tyr Met Cys Val Phe Tyr lie Tyr Phe lie Tyr lie Tyr 

885 890 895 

Val Phe lie Leu Lys Met Lys Lys Met Lys Lys Met Lys Lys Met Lys 

900 905 910 

Lys Met Lys Lys Arg Lys Lys Arg lie 
50 915 920 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 
55 (A) LENGTH: 2101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

60 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(vi) ORIGINAL SOURCE: 



35 
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(A) ORGANISM: Plasmodium falciparum 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO- 9- 

GACGAATGGC GCACAAGAAC GCTACATAGA TGATGCCAAA PPAor a^aSS f^j; < * T '* a, TGAA 360 

■■■■■■I 

: . . * : . ■' 2101 

(2) INFORMATION FOR SEQ ID NO: 10 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 700 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein - 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Glu Gin Gly Asp Asn Lys Val Gly Ala Cys Ala Pro Tyr Arg Arg Leu 
His Leu Cys Asp Tyr Asn Leu Glu Ser Ili Asp Thr Thr Ser ThAhr 



His Lys Leu Leu Leu Glu Val Cys Met Ala Ala Lys Tyr Giu Gly Asn 

40 45 
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Ser lie Asn Thr His Tyr Thr Gin His Gin Arg Thr Asn Glu Asp Ser 

50 55 60 

Ala Ser Gin Leu Cys Thr Val Leu Ala Arg Ser Phe Ala Asp lie Gly 
65 70 75 80 

5 Asp lie Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr Asp Asn Lys Glu 

85 90 95 

Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys Asp lie Phe Lys 

100 105 110 

Lys lie His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Arg Tyr 
10 115 120 125 

lie Asp Asp Ala Lys Gly Gly Asp Phe Phe Gin Leu Arg Glu Asp Trp 

130 135 140 

Trp Thr Ser Asn Arg Glu Thr Val Trp Lys Ala Leu lie Cys His Ala 
145 150 155 160 

15 Pro Lys Glu Ala Asn Tyr Phe lie Lys Thr Ala Cys Asn Val Gly Lys 

165 170 175 

Gly. Thr Asn Gly Gin Cys His Cys lie Gly Gly Asp Val Pro Thr Tyr 

180 185 190 

Phe Asp Tyr Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu 
20 195 200 205 

Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin Lys Gin 

210 215 220 

Cys Arg Asp Tyr Glu Gin Asn Leu Tyr Cys Ser Gly Asn Gly Tyr Asp 
225 230 235 240 

25 Cys Thr Lys Thr lie Tyr Lys Lys Gly Lys Leu Val lie Gly Glu His 

245 250 255 

Cys Thr Asn Cys Ser Val Trp Cys Arg Met Tyr Glu Thr Trp lie Asp 

260 265 270 

Asn Gin Lys Lys Glu Phe Leu Lys Gin Lys Arg Lys Tyr Glu Thr Glu 
30 275 280 285 

lie Ser Gly Gly Gly Ser Gly Lys Ser Pro Lys Arg Thr Lys Arg Ala 

290 295 300 

Ala Arg Ser Ser Ser Ser Ser Asp Asp Asn Gly Tyr Glu Ser Lys Phe 
305 310 315 320 

35 Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 

325 330 335 

Lys lie Leu Asn Lys Glu Gly lie Cys Gin Lys Gin Pro Gin Val Gly 

340 345 350 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Asn Glu Lys Tyr Val Lys 
40 355 360 365 

Thr Phe Ser Arg Thr Glu lie Cys Glu Pro Cys Pro Trp Cys Gly Leu 

370 375 380 

Glu Lys Gly Gly Pro Pro Trp Lys Val Lys Gly Asp Lys Thr Cys Gly 
385 390 395 400 

45 Ser Ala Lys Thr Lys Thr Tyr Asp Pro Lys Asn lie Thr Asp lie Pro 

405 410 415 

Val Leu Tyr Pro Asp Lys Ser Gin Gin Asn lie Leu Lys Lys Tyr Lys 

420 425 430 

Asn Phe Cys Glu Lys Gly Ala Pro Gly Gly Gly Gin lie Lys Lys Trp 
50 435 440 ' 445 

Gin Cys Tyr Tyr Asp Glu His Arg Pro Ser Ser Lys Asn Asn Asn Asn 

450 455 460 

Cys Val Glu Gly Thr Trp Asp Lys Phe Thr Gin -Gly Lys Gin Thr Val 
465 470 475 480 

55 Lys Ser Tyr Asn Val Phe Phe Trp Asp Trp Val His Asp Met Leu His 

485 490 495 

Asp Ser Val Glu Trp Lys Thr Glu Leu Ser Lys Cys lie Asn Asn Asn 

500 505 510 

Thr Asn Gly Asn Thr Cys Arg Asn Asn Asn Lys Cys Lys Thr Asp Cys 
60 5 1 5 5 2 0 5 2 5 

Gly Cys Phe Gin Lys Trp Val *Glu Lys Lys <31n <31n Glu Trp Met Ala 

530 535 540 

lie Lys Asp His Phe Gly Lys Gin Thr Asp lie Val <31*i -Gin Lys Gly 
545 550 555 560 
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20 



25 



30 



35 



40 



45 



50 



55 



60 
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Leu lie Val Phe SerPro Tyr Gly Val Leu Asp Leu Val Leu Lys Gly 
Gly Asn Leu Leu Gin Asn He Lys Asp Va" His Gly Asp Thr Asp'Lp 



He Lys Hi* He Lys Lys Leu Le^Asp Glu Glu Asp Ala VaTlla Val 
Val Leu Gly Gly Lys Asp As^Thr Thr He Asp Lys Leu teu Gin His 
Glu Lys Glu Gin Ala Glu Gin Cys Lys Gin Lys GlnGlu Glu Cy S Glu 
LyS LyS Ala Gln G ^ 5 G1U Ser ^ Ser Ala Glu Thr Arg gIu 

Asp Glu Arg Thr Gin Gin Pro Ala Asp Se" Ala Gly Glu Val Glu^lu 
15 Glu Glu Asp Asp Asp Asp Tyr Asp Glu Asp Asp Glu Asp As^Asp Val 



Val Gin Asp Val Asp Val Ser G°lu He Arg Gly Pro ^ 

. 695 700 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 8220 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
ID) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(Vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

60 

TGGTAAAGGC TTGCAAGgXc GtSgt?S££ S£I taf™ AAAGAGGAAG CTAAAGAACG 120 
ACAAACACCA GAAGATCCa5 G^St^Sga JSSSJi™ GAGAAAAATG AAAGCGATCC 180 

TGTAATTAAT OCGToSSS SSSSSS SSSSto SSSgS SSSS"** 24 ° 
ATGTACACAT AATAGAATAA AAGATirTri a™£™™ TCCGATGAAT ATGGAGGTCA 3 00 
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TGGTGATACA ATAAATAGTG GTGGTAGTGG 
TAGACAGGAA TTGTATGAAG AATGGAAATG 
ACACGATGAG GATGACGAGG AGGATTATGA 
ATTAAAAAAC CAAAAAAAGA ATAAAGAAGA 
5 TGAAATCCAA AAGACATTCA ATCCTTTTTT 
TTCCATACAT TGGAAAAAAA AACTTCAGAG 
TGGAAACAAT AAATGTAATA ATGATTGTGA 
AGACGAATGG GGGAAAATAG TACAACATTT 
TAGTGACAAT ACGGCAGAAT TAATCCCATT 

10 GCAAGAAGAA TTTTTGAAAG GCGATTCCGA 
TCTGGATGCA GAGGAGGCAG AGGAACTAAA 
CAATAATCAA GAAGCATCTG TTGGTGGTGG 
ATTGCTCAAC TACGAAAAAG ACGAAGCCGA 
AG AGG AAAAA GAAAAAGGAG ACGGAAACGA 

15 TAATCCATGT AGTGGCGAAA GTGGTAACAA 
GTATCAAATG CATCACAAGG CAAAGACACA 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA 
ACAAATTTGC AAAATTAACG AAAACTATTC 
ATGTACAGGC AAAGATGGAG ATCACGGAGG 

20 AAATATTGAA GGAAAAAAAC AAACGTCATA 
ACACATGTGT ACATCCAATT TAGAAAATTT 
GGCTAGCCAC TCATTATTGG GAGATGTTCA 
AATAAAACGC TATAAAGATC AAAATAATAT 
CCAGGAGGCT ATGTGTCGAG GTGTACGTTA 

25 AGG AAGA GAT ATGTGGGATG AGGATAAGAG 
CGTATTTAAA AACATTAAAG AAAAACATGA 
TGATGAAAGC AAAAAGCCCG CATATAAAAA 
ACATCAAGTG TGGAGAGCCA TGAAATGCGC 
AGTTGACGAT TATATCCCCC AACGTTTACG 

30 TAAAGCGCAA TCACAGGAGT ATGACAAGTT 
GGGTGATGGA AAATGTACGC AAGGTGATGT 
TAAATATAAA GAGGAAATAG AAAAATGGAA 
CAATCTATTA TACCTACAAG CAAAAACTAC 
TGATGACGAT CCCGACTATC AACAAATGGT 

35 TATTGCCGCA CGTGTTCTTG TTAAACGTGC 
CGCCCCGATC ACCCCCTACA GTACTGCTGC 
GGGGTGCCAG GAACAAACAC AATTTTGTGA 
CACGAAAGAA AACAAAGAAT ACACCTTTAA 
TGATTGCATA AATAGGTCGC AAACAGAGGA 

40 TGCCTGCAAA ATAGTGGAGA AAATACTTGA 
ATGTAATCCA AAAGAGAGTT ATCCTGATTG 
TGATGGTGCT TGTATGCCTC CAAGGAGACA 
GAGTCAAACA GAAAATATAA AAACAGACGA 
AGCAGCAGAA ACTTTTCTTT CATGGCAATA 

45 AATATTAGAT AGAGGCCTTA TTCCATCCCA 
AGATTATAGA GATATATGTT TGAACACAGA 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC 
ATCACGCCAA GAATGGTGGA AAACAAATGG 
CTTAACAAAA TACGTCACAG ATACCGATAA 

50 CGATAAAGTC AACCAATCCC AAAATGGCAA 
TCAATTTCTA CGTTGGATGA TCGAATGGGG 
GGAAAATATC ATAAAAGATG CATGTAATGA 
GAAACATCGT TGTAATCAAG CATGTAGAGC 
AGAATTTTCG GGACAAACAA ATAACTTTGT 

55 AGAATATAAA GGATATGAAT ATAAAGACGG 
ACTGCAAAAA TGTGATAATA ATAAATGTTC 
TCCAAAAGAA AAACCTTTTG GAAAATATGC 
TCAAGGAAAA CATGTACCTA GCATACCACC 
AGCACCAACA GTAACAGTAG ACGTTTGCAG 

60 CAATTTTTCC GACGCTTGTG GTCTAAAATA 
TATACCAAGT GACACAAAAA GTGGTGCTGG 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT 



TACGGGTGGT AGTGGTGGTG GTAACAGTGG 1740 
TTATAAAGGT GAAGATGTAG TGAAAGTTGG 1800 
AAATGTAAAA AATGCAGGCG GATTATGTAT 1860 
AGGTGGAAAT ACGTCTGAAA AGGAGCCTGA 1920 
TTACTATTGG GTTGCACATA TGTTAAAAGA 1980 
ATG TTTA CAA AATGGTAACA GAATAAAATG 2040 
ATGTTTTAAA AGATGGATTA CACAAAAAAA 2100 
TAAAACGCAA AATATTAAAG GTAGAGGAGG 2160 
TGATCACGAT TATGTTCTTC AATACAATTT 2220 
AGACGCTTCC GAAGAAAAAT CCGAAAATAG 2280 
ACACCTTCGC GAAATCATTG AAAGTGAAGA 2340 
CGTCACTGAA CAAAAAAATA TAATGGATAA 2400 
TTTATGCCTA GAAATTCACG AAGATGAGGA 2460 
ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 2520 
ACGATACCCC GTTCTTGCGA ACAAAGTAGC 2580 
ATTGGCTAGT CGTGCTGGTA GAAGTGCGTT 2640 
AAATGGTCGT AACGGAAGTA CATTGAAAGG 2700 
CAATGATAGT CGTGGTAATA GTGGTGGACC 2760 
TGTGCGCATG AGAATAGGAA CGGAATGGTC 2820 
CAAAAACGTC TTTTTACCTC CCCGACGAGA 2880 
AGATGTTGGT AGTGTCACTA AAAATGATAA 2940 
GCTCGCAGCA AAAACTGATG CAGCTGAGAT 3000 
ACAACTAACT GATCCAATAC AACAAAAAGA 3060 
TAGTTTTGCC GATTTAGGAG ACATTATTCG 3120 
GTCAACAGAC ATGGAAACAC GTTTGATAAC 3180 
TGGAATCAAA GACAACCCTA AATATACGGG 3240 
ATTACGAGCA GATTGGTGGG AAGCAAATAG 3300 
AACAAAAGGC ATCATATGTC CTGGTATGCC 3360 
CTGGATGACT GAATGGGCTG AATGGTATTG 3420 
AAAAAAAATC TGTGCAGATT GTATGAGTAA 3480 
CGATTGTGGA AAGTGCAAAG CAGCATGTGA 3540 
TGAACAATGG AGAAAAATAT CAGATAAATA 3600 
TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 3660 
AGATTTTTTG ACCCCAATAC ACAAAGCAAG 3720 
TGCTGGTAGT CCCACTGAGA TCGGGGCCGC 3780 
CGGATATATA CACCAGGAAA TAGGATATGG 3840 
AAAAAAACAT GGTG CAACAT CAACTAGTAC 3900 
ACAACCTCGG CCGGAGTATG CTACAGCGTG 3960 
GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 4020 
GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 4080 
GGATTGCAAA AAGAATATTG ACATTAGTCA 414 0 
AA AACT ATGT TTATATTATA TAGCACATGA 4200 
TAATTTGAAA GATGCTTTTA TTAAAACTGC 4260 
TT ATA AGAGT AAGAATGATA GTGAAGCTAA 4320 
ATTTTTAAGA TCCATGATGT ACACGTTTGG 43 80 
TATATCTAAA AAACAAAATG ATGTAGCTAA 4440 
AAAAGATGGC AGCAAATCTC CTAGTGGCTT 4500 
TCCAGAGATT TGGAAAGGAA TGTTATGTGC 4560 
CAAAAGAAAA ATCAAAAACG ACTACTCATA 4620 
CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 4680 
AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 4740 
AATAAATTCT ACACAACAGT GTAATGATGC 4800 
ATATCAAGAA TATGTTGAAA ATAAAAAAAA 4860 
TCTAAAGGCA AATGTTCAGC CCCAAGATCC 4920 
CGTACAACCG ATACAGGGGA ATGAGTATTT 4980 
TTGCATGGAT GGAAATGTAC TTTCCGTCTC 5040 
CCATAAATAT CCTGAGAAAT GTGATTGTTA 5100 
TCCCCCCGCA CCTGTACAAC CACAACCGGA 5160 
CATAGTAAAA ACACTATTTA AAGACACAAA 5220 
CGGCAAAACC GCACCATCCA GTTGGAAATG 5280 
TGCCACCACC GGCAAAAGTG GTAGTGATAG 5340 
ACGATTATAT GTGGGGAAAC TACAGGAGTG 54O0 
CGCGCCGTCC CACTCACGCG CCGACGACTT 5460 
AGAGACTTTT TTCTTATGGG ATAGATATAA 5520 . 
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AGAAGAGAAA AAACCACAGG GTGATGGGTC ACAACAAGCA CTATCACAAC TAACCAGTAC 5580 
ATACAGTGAT GACGAGGAGG ACCCCCCCGA CAAACTGTTA CAAAATGGTA AGATACCCCC 5640 
CGATTTTTTG AGATTAATGT TCTATACATT AGGAGATTAT AGGGATATTT TAGTACACGG 5700 
TGGTAACACA AGTGACAGTG GTAACACAAA TGGTAGTAAC AACAACAATA TTGTGCTTGA 5760 
5 AGCGAGTGGT AACAAGGAGG ACATGCAAAA AATACAAGAG AAAATAGAAC AAATTCTCCC 5820 
AAAAAATGGT GG CACACCTC TTGTCCCAAA ATCTAGTGCC CAAACACCTG ATAAATGGTG 5880 
GAATGAACAC GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTGACAT ATACAGAAAA 5940 
GAACCCTGAC ACCAGTGCAA GAGGCGACGA AAACAAAATA GAAAAGGATG ATGAAGTGTA 6000 
CGAGAAATTT TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC CAACCGGCAC 6060 

10 ATACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG GTGCCAAAAC 6120 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC GTGTTACGCC CCCCCTACTT 6180 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AGAAAGCATA AATTGGCACA 6240 
AATAAAACAT GAGTGTAAAG TAGAAGAAAA TGGTGGTGGT AGTCGTCGTG GTGGTATAAC 6300 
AAGACAATAT AGTGGGGATG GCGAAGCGTG TAATGAGATG CTTCCAAAAA ACGATGGAAC 6360 

15 TGTTCCGGAT TTAGAAAAGC CGAGTTGTGC CAAACCTTGT AGTTGTTATA GAAAATGGAT 6420 
AGAAAGCAAG GGAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGAGAA 6480 
ATGTGTAAAT GGAAGTAATA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTCCTC 6540 
TAAAGCTAAA GACTTTTTAA AAACGTTAGG ACCATGTAAA CCTAATAATG TAGAGGGTAA 6600 
AACAATTTTT GATGATGATA AAACCTTTAA ACATACAAAA GATTGTGATC CATGTCTTAA 6660 

20 ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG ATTGCCGAAA 6720 
TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TACTAGAAAT 6780 
GCGTGTCAGT GCTGATAGTA AAAGTGGATT TAATGGTGAT GGTTTAGAGA ATGCTTGTAG 6840 
AGGTGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTCGTA ATGTATGTGG 6900 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATTATACA 6960 

25 AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA ATAAAATAAA 7020 
ACATAAAATT TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA AAAATTGTGT 7080 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTTCAAAGA 7140 
TCAATATAAA AATGACAATT CAGATGATGA CAATGTGAGA AGTTTTTTGG AGACCTTGAT 7200 
ACCTCAAATT ACTGATGCAA ACGCTAAAAA TAAGGTTATA AAATTAAGTA AGTTCGGTAA 7260 

30 TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGACGC 7320 
TATAGATTGT ATGCTTAAAA AGGTTAAAGA TAAAATTGGC GAGTGCGAAA AGAAACACCA 7380 
TCAAACTAGT GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC 7440 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGTGA 7500 
AAATGTGTTA AAAACAGCAC AACAAGAGGA TGAAGGCGGT TGTGTCCCAG CAGAAAATAG 7560 

35 TGAAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA 7620 
ACCCGAAGAA GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCCGGC 7680 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACGT 7740 
TCTAACCGCC CTGGTGACCT CCACCCTCGC CTGGAGCGTT GGCATCGGTT TTGCTACATT 7800 
CACTTATTTT TATCTAAAGG TAAATGGAAG TATATATATG GGGATGTGGA TGTATGTGGA 7860 

40 TGTATGTGAA TGTATGTGGA TGTATGTGGA TGTATGTGGA TGTGTTTTAT GGAT ATGT AT 7920 
TTGTGATTAT GTTTGGATAT ATATATATAT ATATATATGT TTATGTATAT GTGTTTTTGG 7980 
ATATATATAT GTGTATGTAT ATGATTTTCT GTATATGTAT TTGTGGGTTA AGGATATATA 8040 
TATATGGATG TACTTGTATG TGTTTTATAT ATATATTTTA TAT AT ATGT A TTTATATTAA 8100 
AAAAGAAATA TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA 8160 

45 AATTTATTAA AATAAAAAAA AAAAAAAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA 8220 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2710 amino acids 
50 (B) TYPE: amino acid * 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



55 



60 



(it) MOLECULE TYPE : protein 

(iii) HYPOTHETICAL: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Plasmodium falciparum 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Asn Val Met Val Glu Leu Ala Lys Met Gly Pro Lys Glu Ala Ala Gly 
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1 5 10 15 

Gly Asp Asp lie Glu Asp Glu Ser Ala Lys His Met Phe Asp Arg He 

20 25 30 

Gly Lys Asp Val Tyr Asp Lys Val Lys Glu Glu Ala Lys Glu Arg Gly 

35 40 45 

Lys Gly Leu Gin Gly Arg Leu Ser Glu Ala Lys Phe Glu Lys Asn Glu 

50 55 60 

Ser Asp Pro Gin Thr Pro Glu Asp Pro Cys Asp Leu Asp His Lys Tyr 
65 70 75 80 

His Thr Asn Val Thr Thr Asn Val He Asn Pro Cys Ala Asp Arg Ser 

85 90 95 

Asp Val Arg Phe Ser Asp Glu Tyr Gly Gly Gin Cys Thr His Asn Arg 

100 105 no 

He Lys Asp Ser Gin Gin Gly Asp Asn Lys Gly Ala Cys Ala Pro Tyr 

115 120 125 

Arg Arg Leu His Val Cys Asp Gin Asn Leu Glu Gin He Glu Pro lie 

130 135 140 

Lys He Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala 
145 150 155 160 

Lys Phe Glu Gly Gin Ser He Thr Gin Asp Tyr Pro Lys Tyr Gin Ala 

165 170 175 

Thr Tyr Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser 

180 185 190 

Phe Ala Asp He Gly Asp He Val Arg Gly Arg Asp Leu Tyr Leu Gly 

195 200 205 

Asn Pro Gin Glu lie Lys Gin Arg Gin Gin Leu Glu Asn Asn Leu Lys 

210 215 220 

Thr He Phe Gly Lys He Tyr Glu Lys Leu Asn Gly Ala Glu Ala Arg 
225 230 235 240 

Tyr Gly Asn Asp Pro Glu Phe Phe Lys Leu Arg G3.U Asp Trp Trp Thr 

245 250 255 

Ala Asn Arg Glu Thr Val Trp Lys Ala He Thr Cys Asn Ala Trp Gly 

260 265 270 

Asn Thr Tyr Phe His Ala Thr Cys Asn Arg Gly Glu Arg Thr Lys Gly 

275 280 285 

Tyr Cys Arg Cys Asn Asp Asp Gin Val Pro Thr Tyr Phe Asp Tyr Val 

290 295 300 

Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arg 
305 310 315 320 

Lys Lys Asn Lys Lys He Lys Asp Val Lys Arg Asn Cys Arg Gly Lys 

325 330 335 

Asp Lys Glu Asp Lys Asp Arg Tyr Cys Ser Arg Asn Gly Tyr Asp Cys 

340 345 350 

Glu Lys Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin Cys 

355 360 365 

He Ser Cys Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp He Asn Asn 

370 375 380 

Gin Lys Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu lie 
385 390 395 400 

Lys Lys Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp 

405 410 415 

Ala Gly Gly Thr Thr Thr Thr Asn Tyr Asp Gly Tyr Glu Lys Lys Phe 

420 425 430 

Tyr Asp Glu Leu Asn Lys Ser Glu Tyr Arg Thr Val Asp Lys Phe Leu 

435 440 445 

Glu Lys Leu Ser Asn Glu Glu He Cys Thr Lys Val Lys Asp Glu Glu 

450 455 460 

Gly Gly Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly 
465 470 475 480 

Ala Ser Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys 

485 490 495 

Tyr Cys Gin Pro Cys Pro Tyr Cys Gly Val Lys Lys Val Asn Asn Gly 

500 505 510 

Gly Ser Ser Asn Glu Trp Glu Glu Lys Asn Asn <3ly Lys Cys Lys Ser 
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515 520 



Gly Lys Leu Tyr Glu Pro Lys Pr^Asp Lys Glu Gly Thr Thr He Thr 

535 540 

lu C 

Lys Phe Cys Asp Glu_Lys Asn Gly Asp Thr He Asn Ser Gly Gly HI 



lie Leu Lys Ser Gly Lys Gly His Asp Asp lie Glu Glu Lys Leu Asn 



Gly Thr Gly G_ly Ser Gly Gly Gly Asn Ser Gly Arg Gin Glu Le^Tyr 

585 59 0 
»lu Asp Val Val Lys V, 
600 605 



Glu Glu Trp Lys cys Tyr Lys GlyGlu Asp Val Val Lys Va^Gly His 
Asp Glu Asp Asp Glu Glu Asp Tyr Glu Asn Val Lys Asn° Ala Gly Gly 



Leu Cys lie Leu Lys Asn sin^Lys Lys Asn Lys Glu Glu Gly Gly Asn 



630 635 



Thr Ser Glu Lys Glu Pro Asp Glu He Gin Lys Thr Phe Asn Pro Phe 



Phe Tyr Tyr Trp Vai" Ala His Met Leu Lys* Asp Ser He His Tr^Lys 



Lys Lys Leu Gin Arg Cys Leu Gin Asn W Asn Arg lie Lys'Jys Gly 



660 



Asn Asn Lys Cys Asn Asn Asp Cys^Glu Cys Phe Lys Arg^Trp He Thr 



695 



Gin Lys Lys Asp Glu Trp d^Lys He Val Gin Hi^Phe Lys Thr Gin 



710 



Asn He Lys Gly Arg ely Gly Ser Asp Asn Sr Ala Glu Leu He Pro 



725 



Phe Asp His Asp Tyr" Val Leu Gin Tyr As^Leu Gin Glu Glu Phe" Leu 



740 



Lys Gly Asp Ser Glu Asp Ala Ser GlTciu Lys Ser Glu Asn^er Leu 



760 



Asp Ala Glu Glu Ala Glu Glu Leu^Lys His Leu Arg Glu lie He Glu 



775 



Ser Glu Asp Asn Asn Gin G iu Ala Ser Val Gly Gly Gly Val Thr Glu 

Lys Leu I 

His Glu 1 

Gly Asp Gly Asn Glu Cys lie Glu Gl^Gly Glu Asn Phe Ar" Tyr Asn 



Gin Lys Asn He Me^ Asp Lys Leu Leu Asn Tyr Glu Lys Asp Glu 15a 



Asp Leu Cys W Glu He His Glu Asp Gl^Glu Glu Glu Lys Glu^ys 



840 



Pro cys Ser Gly Glu Ser Gly ten Lys Arg Tyr Pro Va¥ Leu Ala Asn 



855 



Lys val. Ala Tyr Gin Met hIVm. Lys Ala Lys Thr^Gln Leu Ala Ser 



870 



Arg Ala Gly Arg ser Ala Leu Arg Gly Asp He Ser Leu Ala Gin Phe 



885 



Lys Asn Gly Arg Asn Gly Ser Thr Leu Lys* Gly Gin He Cys Lys" He 



900 905 



Asn Glu Asn Tyr Ser Asn Asp SerArg Gly Asn Ser Gly Gly Pro Cys 
Thr Gly Lys Asp Gly Asp His^G^Gly Val Arg Met Arg^Ile Gly Thr 
Glu Trp ser Asn He Glu Gly Lys Lys Gin Thr Se^ Tyr Lys Asn Val 



Phe Leu Pro Pro Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn 



965 



Leu Asp Val Gly Ser Val Thr Lys Asn Asplys Ala Ser His Se^Leu 



980 



985 99 0 



Leu Gly Asp Val Gin Leu Ala AlaLys Thr Asp Ala Ala Glu He He 

Lys Arg Tyr Lys Asp Gin Asn AsHle Gin Leu Thr AspTro He Gin 

1015 1020 
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Gln Lys Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala 
1025 1030 1035 1040 

Asp Leu Gly Asp lie lie Arg Gly Arg Asp Met Trp Asp Glu Asp Lys 

1045 1050 1055 

5 Ser Ser Thr Asp Met Glu Thr Arg Leu lie Thr Val Phe Lys Asn lie 

1060 1065 1070 

Lys Glu Lys His Asp Gly lie Lys Asp Asn Pro Lys Tyr Thr Gly Asp 

1075 1080 1085 . 

Glu Ser Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu 
10 1090 1095 1100 

Ala Asn Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lys Gly 
1105 1110 1115 1120 

lie lie Cys Pro Gly Met Pro Val Asp Asp Tyr lie Pro Gin Arg Leu 

1125 1130 1135 

15 Arg Trp Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala Gin Ser Gin 

1140 1145 1150 

Glu Tyr Asp Lys Leu Lys Lys lie Cys Ala Asp Cys Met Ser Lys Gly 

1155 1160 1165 

Asp Gly Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala 
20 1170 1175 1180 

Ala Cys Asp Lys Tyr Lys Glu Glu lie Glu Lys Trp Ash Glu Gin Trp 
1185 1190 1195 1-200 

Arg Lys lie Ser Asp Lys Tyr Asn Leu Leu Tyr Leu Gin Ala Lys Thr 

1205 1210 1215 

25 Thr Ser Thr Asn Pro Gly Arg Thr Val Leu Gly Asp Asp Asp Pro Asp 

1220 1225 1230 

Tyr Gin Gin Met Val Asp Phe Leu Thr Pro lie His Lys Ala Ser lie 

1235 1240 1245 

Ala Ala Arg Val Leu Val Lys Arg Ala Ala Gly Ser Pro Thr Glu lie 
30 1250 1255 1260 

Ala Ala Ala Ala Pro lie Thr Pro Tyr Ser Thr Ala Ala <31y Tyr lie 
1265 1270 1275 1280 

His Gin Glu lie Gly Tyr Gly Gly Cys Gin Glu Gin Thr Gin Phe Cys 

1285 1290 1295 

35 Glu Lys Lys His Gly Ala Thr Ser Thr Ser Thr Thr Lys Glu Asn Lys 

1300 1305 1310 

Glu Tyr Thr Phe Lys Gin Pro Pro Pro Glu Tyr Ala Thr Ala Cys Asp 

1315 1320 1325 

Cys. lie Asn Arg Ser Gin Thr Glu Glu Pro Lys Lys Lys Glu Glu Asn 
40 1330 1335 1340 

Val Glu Ser Ala Cys Lys lie Val Glu Lys lie Leu Glu Gly Lys Asn 
1345 1350 1355 1360 

Gly Arg Thr Thr Val Gly Glu Cys Asn Pro Lys Glu Ser Tyr Pro Asp 

1365 1370 1375 

45 Trp Asp Cys Lys Asn Asn lie Asp lie Ser His Asp Gly Ala Cys Met 

1380 1385 1390 

Pro Pro Arg Arg Gin Lys Leu Cys Leu Tyr Tyr lie Ala His Glu Ser 

1395 1400 1405 

Gin Thr Glu Asn lie Lys Thr Asp Asp Asn Leu Lys Asp Ala Phe lie 
50 1410 1415 * 1420 

Lys Thr Ala Ala Ala Glu Thr Phe Leu Ser Trp Gin Tyr Tyr Lys Ser 
1425 1430 1435 1440 

Lys Asn Asp Ser Glu Ala Lys lie Leu Asp Arg <51y Leu lie Pro Ser 

1445 1450 1455 

55 Gin Phe Leu Arg Ser Met Met Tyr Thr Phe Gly Asp Tyr Arg Asp lie 

1460 1465 1470 

Cys Leu Asn Thr Asp lie Ser Lys Lys Gin Asn Asp Val Ala Lys Ala 

1475 1480 1485 

Lys Asp Lys lie Gly Lys Phe Phe Ser Lys Asp Gly Ser Lys Ser Pro 
60 1490 1495 1500 

Ser Gly Leu Ser Arg Gin Glu Trp Trp Lys Thr Asn -Gly Pro Glu lie 
1505 1510 1515 1520 

Trp Lys Gly Met Leu Cys Ala Leu Thr Lys Tyr Val Thr Asp Thr Asp 

1525 1530 1535 
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Asn Lys Arg Lys lie Lys Asn Asp Tyr Ser Tyr Asp Lys Val Asn Gin 

1545 a Pr M 

Ser Gin Asn Gly Asn Pro Ser Lju^Glu Glu Phe Ala Ala Lys Pro Gin 



A 540 1545 1550 

i 

Phe Leu Arg Trp Met lie Glu T^Gly Glu.Glu Phe Cy^Ala Glu Arg 



1575 

Gin Lys Lys Glu Asn Ilelle Lys Asp Ala Cys Asn^Glu lie Asn Ser 

1590 1595 1rnrt 

Thr Gin Gin Cys Asn Asp Ala Lys His Arg Cys Asn Gin Ala Cys Arg 

1605 1610 icic 

Ala Tyr Gin oiu Tyr Val Glu Asn Lys . Lys Lys Glu Phe Ser Gly G !n 

Thr Asn Asn Phe Val Leu Lys Ala AsnVal Gin Pro Gin As" Pro Glu 

- 1635 1640 164* 

Tyr Lys Gly Tyr Glu Tyr Lys^sp Gly Val Gin Pro lie Gin Gly Asn 

lies'** LSU ^? 0 CyS ASP Asn Asn ^ s Cys Ser Cys Met Asp 
Gly Asn Val Leu Serjal Ser Pro Lys Glu Ly^Pro Phe Gly Lys Tyr° 

Ala His Lys Tyr Pro Glu Lys Cys Asp Cy^ T°yr Gin Gly Lys His* Val 

1700 1705 - _ 

Pro ser lie Pro Pro Pro Pro Pro Pro Val Gin Pro Gin Pro Glu Ala 

- x /lb 1720 175c 

Pro Thr Val Thr Val Asp Val Cys Ser He Val Lys Thr Leu Phe Lys 

Asp^hr Asn Asn Phe Ser Asp Ala Cys Gly Leu Lys ^r Gly Lys Thr 



Ala Pro Ser Ser Trp Lys°Cys He Pro Ser ^Thr Lys Ser Gly VlT 

1765 1770 1 n 7c 

Gly Ala Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser He Cys lie 

Pro Pro Arg Arg Arg Arg Leu ^V^ly Lys Leu Gin Glu" 7 Trp Ala 

i8?o LeU Pr ° Gln Gly ^Gly^la Ala Pro Ser Hi" Ser Arg Ala 
AspAsp Leu Arg Asn Jla^Phe He Gin Ser Ala Ala lie Glu Thr Phe 
Phe Leu Trp Asp Arg Tyr Lys Glu Glu Lys Ly^Pro Gin Gly Asp Gly° 



Ser Gin Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu 

_ n , 186 0 1865 1870 

Glu Asp Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys He Pro Pro Asp 



1845 1850 1855 

ir Ser Thr Tyr Ser Asp A 
L865 i 87 ( 
1875 - . - In Asn Gly Lys lie Pro Pi 

PhS ^ 0 Arg Leu Met Phe Tyrjnr^eu Gly Asp Tyr Ar^isp He Leu 

YtL His Gly Gly Asn Thr Ser Asp Ser Gly Asn Thr Asn Gly Ser Asn 
t 1910 1915 io 9n 

Asn Asn Asn He Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin 

1925 1930 i»c 

Lys He Gin Glu Lys He Glu Gin He Leu Pro Lys Asn Gly Gly Thr 
„ , I 940 1945 1950 

Pro Leu Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn 
1955 1960 1965 

^lr, Ala G1U Ser Ile Trp L ^ S G1 y Met Ile c ys Ala Leu Thr Tyr 
1970 1975 1980 

Thr Glu Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys Ile 

1990 1995 2000 

Glu Lys Asp Asp Glu Val Tyr Glu Lys Phe Phe Gly Ser Thr Ala Asp 

. ' 2005 2010 2015 

Lys His Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tvr 

2020 2025 2030 

Asp Tyr Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro 
2035 2040 2045 
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15 



25 



Ser Ala Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro 

2050 2055 2060 

Pro Tyr Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys 
2065 2070 2075 2080 

5 Arg Lys His Lys Leu Ala Gin lie Lys His Glu Cys Lys Val Glu Glu 

2085 2090 2095 

Asn Gly Gly Gly Ser Arg Arg Gly Gly lie Thr Arg Gin Tyr Ser Gly 

2100 2105 2110 

Asp Gly Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp Gly Thr Val 
10 2115 2120 2125 

Pro Asp Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg 

2130 2135 2140 

Lys Trp lie Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala 
2145 2150 2155 2160 

Tyr Glu Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp 

2165 2170 2175 

Asn Gly Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe 

2180 2185 2190 

Leu Lys Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr 
20 2195 2200 2205 

lie Phe Asp Asp Asp Lys Thr Phe Lys His Thr Lys Asp Cys Asp Pro 

2210 2215 222 0 

Cys Leu Lys Phe Ser Val Asn Cys Lys Lys Asp Glu Cys Asp Asn Ser 
2225 2230 2235 2240 

Lys Gly Thr Asp Cys Arg Asn Lys Asn Ser lie Asp Ala Thr Asp lie 

2245 2250 2255 

Glu Asn Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp 

226,0 2265 2270 

Ser Lys Ser Gly Phe Asn Gly Asp Gly Leu Glu Asn Ala Cys Arg Gly 
30 2 2 7 5 2 2 8 0 2 2 8 5 

Ala Gly lie Phe Glu Gly lie Arg Lys Asp Glu Trp Lys Cys Arg Asn 

2290 2295 2300 

Val Cys Gly Tyr Val Val Cys Lys Pro Glu Asn Val Asn Gly Glu Ala 
2305 2310 2315 2320 

35 Lys Gly Lys His lie lie Gin lie Arg Ala Leu Val Lys Arg Trp Val 

2325 2330 2335 

Glu Tyr Phe Phe Glu Asp Tyr Asn Lys lie Lys His Lys lie Ser His 

2340 2345 2350 

Arg lie Lys Asn Gly Glu lie Ser Pro Cys lie Lys Asn Cys Val Glu 
40 2355 2360 2365 

Lys Trp Val Asp Gin Lys Arg Lys <5lu Trp Lys Glu lie Thr <31u Arg 

2370 2375 2380 

Phe Lys Asp Gin Tyr Lys Asn Asp Asn Ser Asp Asp Asp Asn Val Arg 
2385 2390 2395 2400 

Ser Phe Leu Glu Thr Leu lie Pro Gin lie Thr Asp Ala Asn Ala Lys 

2405 2410 2415 

Asn Lys Val lie Lys Leu Ser Lys Phe Gly Asn Ser Cys Gly Cys Ser 

2420 2425 2430 

Ala Ser Ala Asn Glu Gin Asn Lys Asn Gly Glu Tyr Lys Asp Ala lie 
50 2435 2440 ' 2445 

Asp Cys Met Leu Lys Lys Leu Lys Asp Lys lie Gly Glu Cys Glu Lys 

2450 2455 2460 

Lys His His Gin Thr Ser Asp Thr Glu Cys Ser Asp Thr Pro Gin Pro 
2465 2470 2475 2480 

55 Gin Thr Leu Glu Asp Glu Thr Leu Asp Asp Asp lie Glu Thr Glu Glu 

2485 2490 2495 

Ala Lys Lys Asn Met Met Pro Lys lie Cys Glu Asn Val Leu Lys Thr 

2500 2505 2510 

Ala Gin Gin Glu Asp Glu Gly Gly Cys Val Pro Ala Glu Asn Ser -Glu 
60 2515 2520 2525 

Glu Pro Ala Ala Thr Asp Ser <31y Lys -Glu Thr Pro Glu Gin Thr Pro 

2530 2535 2540 

Val Leu Lys Pro Glu <31u Glu Ala Val Pro Glu Pro Pro Pro Pro Pro 
2545 2550 2555 2S60 



45 
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Pro Gin Glu Lys Ala Pro Ala Pro lie Pro Gin Pro Gin Pro Pro Thr 

2565 2570 

Pro Pro Thr Oln ieu- Leu Asp Asn Pro His Val Leu Thr Ala Leu 111 



5 Thr ser Thr Leu Ala Trp Ser Vainly lie Gly Phe Ala Thr Phe Thr 

Tyr Phe^Tyr Leu Lys Val Asn^Ser lie Tyr Met Gly^Met Trp Met 

10 Jjf 5 Val ASP Val ^ ZH^ 3 Met Tr P Met Tyr vA^p Val Cys Gly 

Cys Val Leu Trp Il^Cys lie Cys Asp TyrV^^rp He Tyr He Tyr° 
He Tyr He Cys Leu Cys He Cys Var Phe^G^ly Tyr lie Tyr Va^Tyr 

15 val Tyr Asp Phe Leu Tyr Met Tyr Le^Trp Val Lys Asp Il^Tyr He 

^675 2680 268S 

Trp 2So Tyr ^ Val P o h c e J yr 116 Ile Leu He Cys He 

^e»u 2695 2700 

Tyr Ile Lys Lys Glu lie 
20 2 7 0 5 2 7 1 0 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19124 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 



35 (X±) SE Q UENCE DESCRIPTION: SEQ ID NO: 13: 

ACATTTTTTC GTAATATATA TAT AT AT AT A TATATATAAT TCTCTTTTTC TAATATATAT 
™T^S^m CTA ^ > T TTCGATTTT TTCATTTTTT TCCAGTATTA ATTTATTTAT TTATTTCTCA i ?r 

An I a ^tataatatattatttaaatgtgtatttatatatgtStt^ ll[ 
11%™1™Z C CGAGCGAAAA aaaatatata atctcatata aaaattattt ataItIca^? lie 
' wiltSS T^cctattaa aataaattaa tataatatac aataatattt cttct?St*t lie 

rZ^^I 2 ^ •AACTAATTTC TTATTTTTAT TTAACTTTAT TCCTTTTTAA TTTOTAAT? 

?*T™ GCA AACAAAAAAC ataaagtaat tctacatatc aacaaaaaaa aaISa^aI 42? 

« AAAAAAAAAA ATTTATTATA ATATAATAAA AAATATAAAG ACMACGTTC ACTTATT^ tic 
^$^ ATT ^"ATTACGATT AAAACATATT GAGATTATAA TAATATAATT TAACaSgaI 540^ 
AG ^ G ^3> AA ^' A ATACATTTTT TTTTTTTTTT TGATATGTAA TTCAACATAT ATATATATAT U S 
A ™^r TT AATTTAATTA AATAAAATTC CTTATTATTC ATATTGTTTC TtScISI til 
^ GAAA I AT TAAAA ATAAT TTTCGATTTT ATCGATATAT TTATGTCCTT TATA^SS ?IS 

•50 I A ^ AG SI CTT TATAA CTATT GATTAATAGA AGGTAATAGC CTAATAATAT AAAtIcTCOT ill 
iS^^ T TCATT TATAT ATTTCAAATA TATTTCGATG GTTTATTTTC AAATACAATT III 
AATTAGATTT CTTAAATATT TCTTCATTTA TTCATTTTTA TAG C ATATAC ATCCaSStI Ill 
TAAATTATTA ATAAAAAATT TTTATTTTAA TATATAATAA CAATTTOCAT aSSa^CT III 
TITCACACAA CATTTAAGTT GTCATAATGT AACACATTAA ATAATATATT ACH^M^AT 1020 

W £1™^™ ^AATTATATA TTAAATAAAA ATGTATTATC GCCTGTATTA TCATAGTAtI llll 

55 TATAATGTTG TATAACGCTT CAAAATATAT ATAATAATAT AATTAAAAAT ATATATATA? 

J AAT I AA Z TA TTTTGTTATG TTATGTAATA ATGCAATTAA TATAAGATAA A^CTaSg ^2oS 
^ A I3 ATTTA AAATATATAT ATATATATAT ATATATATAT ATATTAGTAT ATGTTATCAA llll 
^JTATAA TATGTAAATT ATTAATAAAA TATATTTGTA TAACATACAA GACTAAAGAA llll 

Ml AA< T TATAC AA TCTGG TATCT AATAGTATAT ATATATAATA TCTTTTTTAT TTAATTGTTC llll 
r^ZZir TTTTTTTTAA ATAATAATAA ATATTAATAT ATTTTTTTTC ATAaStI? llll 
GATTTAGTAT TTTAATAATA AATAAATCTT TTAAAAAACT TCAAAACATT TTTGCATAAA Itio 
^TATTAA TATTAGTAAC CACCTAGATA AATTAGAGAG AAACGTAGAA SSSSaIS ls 60 
AAAATTAGAA CAAAAAGAAT ATT ACAAAAA ATAATAAAAT TAAATTATTT CTTTACTATT llll 
AATTTAAAGT TTTTTTTCAT AT CAT AT ATT ATGATACACA ATGTTTGTTG TTAAATGTTT 1680 
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TATATACATG CAATGATATG TTTCTGTTGG 
AAATGTATTG TACACCTTTA GCAACTATTA 
GAAAATATGT TATATTATTA CAATATCTTA 
AAAATTACAA TTGTAATTAA TCGTATGACA 
5 CAAAATTATA AAAAATATGG AAATGTTTTG 
TTATTTTATT ATTTATTTTT TTTTTTTTTT 
AAGTAAAAAA TATATATATT TACATAATGG 
ATAATAATAT TTTAGATTAA ACATATGTAA 
ATATATATAT TAATTATTAA GTTATAGATT 

10 AAAATGAAAG TTCACTACAG TAATATATTA 
ATATCACGTA TGCACTAAAT AATGACAATA 
GTAAATAAAA AAATATACAT ATATACAAAA 
GATAAATATC CAGAAGAACT ATTACATCAC 
ACAACCACTA GGTTATTATG CGAATGTGAC 

15 GAAATGATAT TAGTGATGGA AAATTTCAAT 
AATGAACGCA TGCAAGAAAA ACGAAAAATA 
AAAATTATTT TAAAAGATAA AATCGAAAAG 
ACGAATATAA AGACTGAGGA TATACCTACT 
GTGGAAAAAA CGTGTTTGAA ATGTGGAGGT 

20 GGTTTATTAG GAGAAATAGG TGGACTTGTT 

AAAGcrrrrc ttacttttgc tcaaaaggaa 

ACTGCTCGTA TTGATACAGT TATTTAAGGA 
AATGGTTCTA CGTTGGGGAA AGTTATTACC 
ACTACGGCAC TATATAATGA ATATGTAAGC 

25 AAATTAATTT GTGCTTTTGG GATGAGAGAC 
CGAGACGTTA TAGGATCAAG TGTAAAAGGA 
CAAGCTGCTG AGACAGCTGC TAACGAAACT 
AAAATAACAT CTGCAGGTGC TAATTTACAC 
TTGGTTATAG TTTTGGTTAT GGTAATTATT 

30 AAAATGAAGA AAAAATTGCA ATATATAAAA 
CTATTAGCGG TAATTTAAAG TATTGTGAAT 
AATTAATTTT TTTTTATAAT ATTATATTTT 
TATTATATGA TTATTTAATT ATTATACTTA 
ATATATGTAT CTATCTATCT ATCTATCTAT 

35 TTATTATTAT TAGATGCATA TTAGTGATGA 
ACATAATAAT ATATTAAATT AATAGAACTT 
AGAAATTTGA AAAAGTAATT TACACATGAT 
TATTTATTTA TAAAAATTGT TTAATATAAG 
TTAGCTTTCC ATTATACAAA TATATATTTC 

40 TAAAAAAAGT ATAATATAAT AAAATATCTA 
AAATTTTAAT TTTATACGAT AGAATAAATT 
AAGAACCTAT TACAATATAG TAACAACTGG 
TGTAAAAGGA TAGTTGTTAA AGGCTTTTTT 
TATAATAGAT ATCTTAACAT ACAACTTTGC 

45 AGAAATATTA TAAATAATAT TATAAAAAAT 
TATTAATTTA ATTTTATTTT ATTGTTCTAA 
TCTAATATAA TTAAGATATT TCTAATATTA 
AGAATAATTT TTTACTTATT TATTATAATA 
GATGACAAAA AAAAAACTTT TAAAATGGAA 

50 TAATTGGTGA AATAGTTGTA ACTTATACAA 
TAATATTGTT TATGTATCGT AATATATATT 
TTCTAATAAT ATATTCATAT GTAGTCATAG 
TATTATTGTA TATATTAAAT AAGTAACACA 
ATAATATATT TTTATGTTAT ATATTATTAG 

55 ATGAAAATTT TTGTATATGA TATAGTTATA 
ATGGAAAGCA TAAAAAATGT TACTGTAATA 
TTATCTTAAA AAGGTTCCTA TTATAACATT 
TAACTACATT TACATAATGA AATTTCGATT 
ATTATTTATA TGTGAATGCG TTCTATATAA 

60 ATAAGAAATA AATATCCTGA TTTTGTAGTT 
TATATATTAT ATATATCTTT ACAACAAGTA 
GAAAATAAAA ATAATAAAAT AAGAATACTG 
AATGTAACAT AATTACAAAT ACGTAACATG 
AGGATAAATA TAAATATTTA AAATTATATT 



AATATGTATT ATATACTTAT AT-GTTCTAAT 1740 
CTACACACAT TTTTATATAA TTTATAACAG 1800 
ATGTGTTTTT GCAAAAATAT AAAAAACAAG 1860 
TAAAATTATA TTATATTAGA AATTAAAATT 1920 
TTATATTATT TTTTTAAAAA TTTAATTATT 1980 
GTGTTCTAAA TAAAAAGGCA AATATGATTC 2040 
CAAAATAATT GTTTATTATA TTATATGACT 2100 
TTCATTTAAC AGAATAAAAT AAAATATTAT 2160 
TAATAAAAAT ATATTATACA TATGAGATTA 2220 
TTATATGTCG TCAATTTAAG TATATTCTTA 2280 
ATAATATATA TGTAACATTT TATAATTGAT 2340 
ACATATATGA TATTTACATT CTTTTTTATA 2400 
TTCACTTCAT ATACCAAACA CGAAAAAAAT 2460 
TTATATACGT CCATTTATGA TAATGACCCG 2520 
AAACAGACAG AAGAAAGGTT TCATGAATAC 2580 
TGTAAAGAAC AATGCGAAAA GGATATACAA 2640 
GAATTAACAG AAAAGTTAGA GGCATTGGAA 2700 
TGTGTATGCG AAAAATCAGT AGCAGATAAA 2760 
ATATTGGGTG TTGGTGTGAC TCCATCTTTA 2820 
ATAAATAATT GGACAAATAC TCCTTTTTAT 2880 
GGTATAGCTG CCGGTAAAAT TGCTAGTGAT 2940 
ATAATATCAA ATTTTGATGT GCACACTATA 3000 
GTAGAAGCTC TTAAGGATGA CACTACTCTT 3060 
ATGTGTGTAA ATAGGAACCC TGTCGAAGAC 3120 
GGTCTAGTTG CAGGGCAATA TGCTTCATCG 3180 
ATTATTAGAA AAGCTGCAAA CGCTGCTTCA 3240 
ACTTCCGGAA TGATCGAAGC CGAGTTAAGT 3300 
AGTGCAATTA CTTACTCAGT AACTGCGATA 3360 
TATTTAATAT TACGTTATCG TAGAAAAAAA 3420 
TTATTAAAGG AATAGATATA CGATGTCGAG 3480 
TTTTCATTTA ATATGCTATG ATCATTTGAT 3540 
TTTATACCTT GGATTCTTAC ATTGTTTTAT 3600 
TAT AT AT AT A TATTTTTACA TTAAGATATT 3 660 
ATATATATAT ATATATATAT ATTATAATAA 3720 
TTATAATAAT AACCTATTGA AGAGAATAGA 3780 
CATTTTTATT GTTATATGTA TATAAAAATA 384 0 
AATGTATTTT ATTTTATTTG TGTTGTTTTA 3 900 
TTGTTATTAT AATTTTTTAA TATGGCACCA 3 960 
CTCATTAGAA TCTGAATATT TATTGTATTA 4020 
AGATTTTTTC TAATTTGTTT AATTTATAAT 4 080 
ATAATCAACA TATATATATG TATTCATCTT 4140 
TTCCTTTTTA TTATAAATAA CATAAGAATG 4200 
AATATTGATT ATAAATGTTT GTAAGATATA 4260 
ATAATTGTAA TTAAAAAAAT ATATATAATA 4320 
T AAG CAT AAA TGTCACAATA AATTTTTTTT 4380 
AATATATTGA TTATGAGAAT ATTATTTGTG 4440 
ATTTATATAT ATATATTTAA AAGTATTTTA 4500 
TGAAATATGC ATGGAGTATA TATAAATATT 4560 
AATATGCATA TAATAAAATA CTATATAGTA 4620 
ACATGTTGCA TTCATAATTT A<3AGATTATG 4680 
AATATAATTG TTTTTTTAGT ATGTATGGTA 4740 
TGTCAATGAA TATAAAATAT GGTATATTTA 4800 
GAACATTATA TATAGTAATA AATAGAAGAA 4860 
TTATTATAAA GGGGAAAATT CATAATATTT 4920 
AGTTAAAAAA AAAAAAAAAC AAGAACAAAA 4980 
GGATAAAATA TATTATATAA AATGTTTATT SO 40 
AAAAAAAATT TGTCCCATTT TATAAATAAT 5100 
TTGTGTTTTT TTGATGAATA TTATGGACTA 5160 
TAATAATAAT TTTATTTAAA AAAATGAAAA 5220 
CCAATAGCTT AATATAATTA TGGACTCATA 5280 
ATAAGTAAAT ATTATTTTAA TCTTAATAAG 5340 
AATAATAAGT CATATTATAC ATTTTTTAAA 5400 
TATTATAGAA ATAATAAGAA TTTAATATTA 5460 
TTTTTATGTC AATTTATGTT ATATTATATT 5520 



BNSDOCJD: <WQ 9S40766A2. 1_> 



WO 96/40766 

PCT/US96/09508 

•58- 

- ======= 

15 AAAATGCATG 2Ss££aA 2SS ^I??^ 1 " T<»!»TTCaT 6360 

TATAATATAA TATAATATAA TAAtSSSt" TT^SSrt? SI^S TT* 1 **"" " 20 

ssjsi ssss? sasss sail 35=1 -ass- 
■sssssj ssss -?sg^ E^™ ™™ ™' 

ATAAAAAAAA TAATATATAT aSatSSS JXK2£« H™CTTAA TAAATAAAAC 6840 

- ssss iilE I™E ™ss esses :jjs 

• ■■■■■■ 

: ■■■■■■I 

■ Mum 

ATATGAAAAA AAATTTTATG ACGAACTTAA TAAAAGTGAA TATAGAACCG TTGATAAATT Itln 

SJSSS™ ^S^™ GTAGTAATGA ATGGGAAGAG AAAAATAATG Go5£?£Sa* 888^ 
^S-SS**** CTTTATGAGC CTAAACCCGA CAAAGAAGGT ACTACTATTA CAATCCTTaa llln 
AAGTGGTAAA GGACATGATG ATATTGAAGA AAAATTAAAC AAATTTTGT? Arrl^at* !«™ 
60 A TAAATAGTG GTGGTAGTGG SSS SJSSfi^ 90^0 

1™^°^ TTGTATG AAG AATGGAAATG TTATAAAGGT GAAGATGTAG TGAAAgSS I?Sn 
A ^GATGAG GATGACGAGG AGGATTATCA AAATGTAAAA AATGCAGGCG GjS^gSS lilt 
A ^ AAAAAA S CAAAAAAAGA ATAAAGAAGA AGGTGGAAAT ACGTCTGAAA SSStcSSJ Ilia 
TGAAATCCAA AAGACATTCA ATCCTTTTTT TTACTATTGG GTTGCACATA TCTtSSSJ 
TOCATAQT TGGAAAAAAA AACTTCAGAG ATGTTTACAA AaS^SS SSa^TC 9360 
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TGGAAACAAT AAATGTAATA ATGATTGTGA 
AGACGAATGG GGGAAAATAG TACAACATTT 
TAGTGACAAT ACGGCAGAAT TAATCCCATT 
GCAAGAAGAA TTTTTGAAAG GCGATTCCGA 
5 TCTGGATGCA GAGGAGGCAG AGGAACTAAA 
CAATAATCAA GAAGCATCTG TTGGTGGTGG 
ATTGCTCAAC TACGAAAAAG ACGAAGCOGA 
AGAGGAAAAA GAAAAAGGAG ACGGAAACGA 
TAATCCATGT AGTGGCGAAA GTGGTAACAA 

10 GTATCAAATG CATCACAAGG CAAAGACACA 
GAGAGGTGAT ATATCCTTAG CGCAATTTAA 
ACAAATTTGC AAAATTAACG AAAACTATTC 
ATGTACAGGC AAAGATGGAG ATCACGGAGG 
AAATATTGAA GGAAAAAAAC AAACGTCATA 

15 ACACATGTGT ACATCCAATT TAGAAAATTT 
GGCTAGCCAC TCATTATTGG GAGATGTTCA 
AATAAAACGC TATAAAGATC AAAATAATAT 
CCAGGAGGCT ATGTGTCGAG CTGTACGTTA 
AGGAAGAGAT ATGTGGGATG AGGATAAGAG 

20 CGTATTTAAA AACATTAAAG AAAAACATGA 
TGATGAAAGC AAAAAGCCCG CATATAAAAA 
ACATCAAGTG TGGAGAGCCA TGAAATGGGG 
AGTTGACGAT TATATCCCCC AACGTTTACG 
TAAAGCGCAA TCACAGGAGT ATGACAAGTT 

25 GGGTGATGGA AAATGTACGC AAGGTGATGT 
T AAAT ATAAA G AGG AAATAG AAAAATGGAA 
CAATCTATTA TACCTACAAG CAAAAACTAC 
TGATGACGAT CCCGACTATC AACAAATGGT 
TATTG CCGCA CGTGTTCTTG TTAAACGTGC 

30 CGCCCCGATC ACCCCCTACA GTACTGCTGC 
GGGGTGCCAG GAACAAACAC AATTTTGTGA 
CACGAAAGAA AACAAAGAAT ACACCTTTAA 
TGATTGCATA AATAGGTCGC AAACAGAGGA 
TGCCTGCAAA ATAGTGGAGA AAATACTTGA 

35 ATGTAATCCA AAAGAGAGTT ATCCTGATTG 
TGATGGTGCT TGTATGCCTC CAAGGAGACA 
GAGTCAAACA GAAAATATAA AAACAGAGGA 
AG CAGCAGAA ACTTTTCTTT CATGGCAATA 
AAT ATT AG AT AGAGGCCTTA TTCCATCCCA 

40 AGATTATAGA GATATATGTT TGAACACAGA 
GGCAAAAGAT AAAATAGGTA AATTTTTCTC 
ATCACGCCAA GAATGGTGGA AAACAAATGG 
CTTAACAAAA TACGTCACAG ATACCGATAA 
CGATAAAGTC AACCAATCCC AAAATGGCAA 

45 TCAATTTCTA CGTTGGATGA TCGAATGGGG 
GGAAAATATC ATAAAAGATG CATGTAATGA 
GAAACATCGT TGTAATCAAG CATGTAGAGC 
AGAATTTTCG GGACAAACAA ATAACTTTGT 
AGAATATAAA GGATATGAAT ATAAAGACGG 

50 ACTGCAAAAA TGTGATAATA ATAAATGTTC 
TCCAAAAGAA AAAC CTTTTG GAAAATATGC 
TCAAGGAAAA CATGTACCTA GCATACCACC 
AGCACCAACA GTAACAGTAG ACGTTTGCAG 
CAATTTTTCC GACGCTTGTG GTCTAAAATA 

55 TATACCAAGT GACACAAAAA GTGGTGCTGG 
TGGTAGTATT TGTATCCCAC CCAGGAGGCG 
GGCTACCGCG CTCCCACAAG GTGAGGGCGC 
GCGCAATGCG TTCATCCAAT CTGCTGCAAT 
AGAAGAGAAA AAACCACAGG GTGATGGGTC 

60 ATACAGTGAT GACGAGGAGG ACCCCCCCGA 
CGATTTTTTG AGATTAATGT TCTATACATT 
TGGTAACACA AGTGACAGTG "GTAACACAAA 
AGCGAGTGGT AACAAGGAGG ACATGCAAAA 
AAAAAATGGT GGCACACCTC TTGTCCCAAA 



ATGTTTTAAA AGATGGATTA CACAAAAAAA 9420 
TAAAACGCAA AATATTAAAG GTAGAGGAGG 9480 
TGATCACGAT TATGTTCTTC AATACAATTT 9540 
AGAC GCT TCC GAAGAAAAAT CCGAAAATAG 9600 
ACACCTTCGC GAAATCATTG AAAGTGAAGA 9660 
CGTCACTGAA CAAAAAAATA TAATGGATAA 9720 
TTTATGCCTA GAAATTCACG AAGATGAGGA 9780 
ATGTATCGAA GAGGGCGAAA ATTTTCGTTA 9840 
ACGATACCCC GTTCTTGCGA ACAAAGTAGC 9900 
ATTGGCTAGT CGTG CTGGTA GAAGTGCGTT 9960 
AAATGGTCGT AACGGAAGTA CATTGAAAGG 10020 
CAATGATAGT CGTGGTAATA GTGGTGGACC 10080 
TGTGCGCATG AGAATAGGAA CGGAATGGTC 10140 
CAAAAACGTC TTTTTACCTC CCCGACGAGA 10200 
AGATGTTGGT AGTGTCACTA AAAATGATAA 10260 
GCTCGCAGCA AAAACTGATG CAGCTGAGAT 10320 
ACA ACTAA CT GATCCAATAC AACAAAAAGA 10380 
TAGTTTTGCC GATTTAGGAG ACATTATTCG 10440 
CTCAACAGAC ATGGAAACAC GTTTGATAAC 10500 
TGGAATCAAA GACAACCCTA AATATACCGG 10560 
ATTACGAGCA GATTGGTGGG AAGCAAATAG 10620 
AAGAAAAGGG ATGATATGTG GTGGTATGGG 1-0680 
CTGGATGACT GAATGGGCTG AATGGTATTG 10740 
AAAAAAAATC TGTGCAGATT GTATGAGTAA 10800 
CGATTGTGGA AAGTGCAAAG CAGCATGTGA 10860 
TGAACAATGG AGAAAAATAT CAGATAAATA 10920 
TTCTACTAAT CCTGGCCGTA CTGTTCTTGG 10980 
AGATTTTTTG ACCCCAATAC ACAAAGCAAG 11040 
TGCTGGTAGT CCCACTGAGA TCGCCGCCGC 11100 
CGGATATATA CACCAGGAAA TAGGATATGG 11160 
AAAAAAACAT GGTGCAACAT CAACTAGTAC 11220 
ACAACCTCCG CCGGAGTATG CTACAGCGTG 11280 
GCCGAAGAAA AAGGAAGAAA ATGTAGAGAG 1134 0 
GGGTAAGAAT GGAAGGACTA CAGTAGGTGA 11400 
GGATTGCAAA AACAATATTG ACATTAGTCA 11460 
AAAACTATGT TTATATTATA TAGCACATGA 11520 
TAATTTGAAA GATGCTTTTA TTAAAACTGC 11580 
TTATAAGAGT AAGAATGATA GTGAAGCTAA 1164 0 
ATTTTTAAGA TCCATGATGT ACACGTTTGG 11700 
TATATCTAAA AAACAAAATG ATGTAGCTAA 1176 0 
AAAAGATGGC AGCAAATCTC CTAGTGGCTT 11820 
TCCAGAGATT TGGAAAGGAA TGTTATGTGC 1188 0 
CA AAAG AAAA ATCAAAAACG ACTACTCATA 11940 
CCCTTCCCTT GAAGAGTTTG CTGCTAAACC 12000 
AGAAGAGTTT TGTGCTGAAC GTCAGAAGAA 12060 
AATAAATTCT ACACAACAGT GTAATGATGC 1212 0 
ATATCAAGAA TATGTTGAAA ATAAAAAAAA 1218 0 
TCTAAAGGCA AATGTTCAGC CCCAAGATCC 12240 
CGTACAACCG ATACAGGGGA ATGAGTATTT 123 00 
TTGCATGGAT GGAAATGTAC TTTCCGTCTC 12360 
CCATAAATAT CCTGAGAAAT GTGATTGTTA 1242 0 
TCCCCCCCCA CCTGTACAAC CACAACCGGA 12480 
CATAGTAAAA ACACTATTTA AAGACACAAA 12540 
CGGCAAAACC GCACCATCCA GTTGGAAATG 12600 
TGCCACCACC GGCAAAAGTG GTAGTGATAG 12660 
ACGATTATAT GTGGGGAAAC TACAGGAGTG 12720 
CGCGCCGTCC CACTCACGCG CCGACGACTT 12780 
AGAGACTTTT TTCTTATGGG ATAGATATAA 12840 
ACAACAAGCA CTATCACAAC TAACCAGTAC 12900 
CAAACTGTTA CAAAATGGTA AGATACCCCC 12960 
AGGAGATTAT AGGGATATTT TAGTACACGG 13020 
TGGTAGTAAC AACAACAATA TTGTGCTTGA 13080 
AATACAAGAG AAAATAGAAC AAATTCTCCC 1314 0 
ATCTAGTGCG CAAACACCTG ATAAATGGTG 13200 
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GCCGAATCTA TCTGGAAAGG TATGATATGT GCATTGACAT ATACAGAAAA 132fio 
^S AGTGCAA GAGGCGACGA AAACAAAATA GAAAAC^A^ ATGAAG^otJ isala 
^S^ 1 ^ TTTGGCAGCA CAGCCGACAA ACATGGCACA GCCTCAACCC CAACCGGrar ! p 
ATACAAAACC CAATACGACT ACGAAAAAGT CAAACTTGAG GATACAAGTG gSSSp 
CCCCTCAGCC TCTAGTGATA CACCCCTTCT CTCCGATTTC ctSSSgCC C^Sa^JS J« J2 
CCGTTACCTT GAAGAATGGG GTCAAAATTT TTGTAAAAAA AgSSgSS J??!S 
AATAAAACAT GAGTGTAAAG TAGAAGAAAA TGGTGGTGCT AGTCGTOGTG CTgI™™^ n 

aagacaatat agtggggatg gcgaagcgtg taatgagatg c?tccaSS aS??SSJS illii 
5ESS5£ T TTAGAAAAGC cgagttgtgc caaaccttgt agttc^tata g1S£?Sa*? S?JS 

a^S™ C3GAAAAGAGT TTGAGAAACA AGAAAAGGCA TATGAACAAC AAAAAGACAA 138^ 
^ G ^™ T GGAAGTAATA AGCATGATAA TGGATTTTGT GAAACACTAA CAACGTcSc lllin 
TAAAG CTAAA GACTTTTTAA AAACGTTAGG ACCATGTAAA CCTAATAATG TATARrptos 
AACAATTTTT GATGATGATA AAACCTTTAA ACATACAAAA gJSS^TC CaISSStAA I^rS 
ATTTAGTGTT AATTGTAAAA AAGATGAATG TGATAATTCT AAAGGAACCG STTPrrriii 
TAAAAATAGT ATTGATGCAA CAGATATTGA AAATGGAGTG GATTCTACTG TA^TARAAAT 
G ^ G ^ CAGT GCTGATAGTA AAAGTGGATT TAATGCTGAT GGtSSSS I^GCT^S Itlii 
A ^ G TGCTGGT ATCTTTGAAG GTATTAGAAA AGATGAATGG AAATGTcSS AtcSSSS Itlto 
TTATGTTGTA TGTAAACCGG AAAACGTTAA TGGGGAAGCA AAGGGAAAAC ACATTATArA unon 
AATTAGAGCA CTGGTTAAAC GTTGGGTAGA ATATTTTTTT GAAGATTATA aI^SSSS Itllo 
A ^ A T AAAA II TCACATCGCA TAAAAAATGG TGAAATATCT CCATGTATAA JuSStTOTQT J O 
AGAAAAATGG GTAGATCAGA AAAGAAAAGA ATGGAAGGAA ATTACTGAAC GTTT^AAA^a n^cn 
^ACAATT CAGATGATGA CAATGTGAcS iSSSSSS Jl SS 

ACTGATGCAA ACGCTAAAAA TAAGGTTATA AAATTAAGTA AGTTCGGTAA 14580 
TTCTTGTGGA TGTAGTGCCA GTGCGAACGA ACAAAACAAA AATGGTGAAT ACAAGGArrr J1S X 
^ A I AGA ^ T ATGCTTAAAA AGCTTAAAGA TAAAATTGGC GAGTGcSa^ JSSSSSS"' ll?J o 
^AA^T GATACCGAGT GTTCCGACAC ACCACAACCG CAAACCCTTG AAGACGAAAC ^7fiS 
TTTGGATGAT GATATAGAAA CAGAGGAGGC GAAGAAGAAC ATGATGCCGA AAATTTGT?? Ilflf n 
AAAA S AGCAC AACAAGAGGA TGAAGGCGGT JSSSSSS 2£££ft5£ liSSS 
^AAGAACCG GCAGCAACAG ATAGTGGTAA GGAAACCCCC GAACAAACCC CCGTTCTCAA ItHo 
a^**™^ GAAGCAGTAC CGGAACCACC ACCTCCACCC CCACAGGAAA AAGCCCoSc litio 
ACCAATACCC CAACCACAAC CACCAACCCC CCCCACACAA CTCTTGGATA ATCCCCACTT SXJa 
JESSES** GTGGTGACCT CCACCCTCGC CTGGAGCGTT GgStcSS toSaCOT lllti 
^EZZZJT' TATCTA AAGG TAAATGGAAG TATATATATG GGGATGTGGA TGtSSSSS llllo 

tgtatgtgaa tgtatgtgga tgtatgtgga tgtatgtgga tgtgttttat ggata?gS? lllln 

SS™SI GTTTGGATAT ATATATATAT ATATATATGT tSSgTMAT SSSttSJ lilt 0 
JSJSE^ GTGTATGTAT ATGATTTTCT. GTATATGTAT TTGTGGGTTA AgSaTaSJa Hill 
IS^E? TA CTTGTATG TGTTTTATAT ATATATTTTA TATATATGTA TTTATaSS Hill 
AAA ^S AAA T A TAAAAACAAA TTTATTAAAA TGAAAAAAAG AAAAATGAAA TATAAAAAAA Ittlo 
^Z^Z^ AATAAAAAAA AAAAAAAAAA AAAAGGAGAA AAATTTTTTA AAAAATAATA SmS 
AAAATTATAA TAAAATATAA ATTTTGATAG AATAAAAAAT GAAAAAGATT ATCAAAAaIa « 

AA Ia AAAAAA aaattttata taaaaaaaaa atgattataa aaaaaataaa aacaaaagaa lltll 

a AA ^ A t« AAAA AACATTAAAA AAAAAAAAAT ATATATCATA AAAACAAAAA AAAAAGAAAA 15720 
AAATATATTA AAATAAAAAT ATATATCATA AAATAAAAAA AAATTAAAAA AATGTTAAAA i ,»n 
AAAAAATATA TACATAAAAT AAAAAAAATT TATTTAAATA AAAAAAAATA ATAAATAAAA llllo 
AAATTTAATT AAATAAAAAA AAATAATAAA TAAAAAAATT TAATTAAATA AAAAAAAATT Itlil 
AAAAAAAATT AATGAAATAA AAAAAAATAA AAAAATTTAA TTAAATAAAA AAAATAAAAT III™ 
^SZZ^Z? ACATGCACAT ATACATACAT ATATATATAT ATATACCCAT AACTACATAC HHo 
^^IT^ACA CATACATATA TATATATATA TATACCCATA ACTACATACA CATTTACACA 16080 
T A ™ TATA TATTATATAT ATATATATAT ATACCCATAA CTACATACAT ATATACATTA llllo 
ACAAACACAT ATATAATACC TAAATACATA TATACATACA CATATATGTT CATTTTTTTT Itlto 
™ GAAAAA AACCAAATCA TCTGTTGGAA ATTTATTCCA AATAcTGCAA ATAcSSaI 162G0 
G ^ A I TATCA TATACCGACA AAACTTTCAC CCAATAGATA TATACCTTAT ACTAGTGGTA lllll 
CAAACGGTAC ATTTACCTTG AAGGAGATAG TGGAACAGAT AGTGGTTACA 16380 
C S G ™JI A T AGTGATATA ACTTCCTCAG AAAGTGAATA TGAAGAGATG GATATAAATG 16440 
A ™?ATGT ACCAGGTAGT CCTAAATATA AAACATTAAT TGAAGTGGTA CTTGAAcSS UtoO 
GTGGTAACAA CACAACAGCT AGTGGTAACA ACACAACAGC TAGTGGTAAC AACACAACAG Ittto 
"AGTGGTAA AAACACACCT AGTGATACAC AAAATGATAT ACAAAATGAT GgSSSSa let 20 
GTAGTAAAAT TACAGATAAT GAATGGAATC AATTGAAAGA TGAATTTATA TCACAATATC Ittll 

tacaaagtga accaaataca gaaccaaata tgttaggtta taatgtggat aatSJItcc 16740 

ATCCTACCAC GTCACATCAT AATGTGGAAG AAAAACCTTT TATTATGTCC ATTCATGATA Itlto 

gaaatttatt tagtggagaa gaatacaatt atgatatgtt taatagtggg aatStSS llleo 

T AAA C ATT AG TGATTCAACA AATAGTATGG ATAGTCTAAC AAGTAACAAC CATAGTCCAT ^6920 
ATAATGATAA AAATGATTTA TATAGTGGTA TCGACCTAAT CAACGACGCA CTAAGTgSI Itllo 
SSSSES* TATATATGAT GAAATGCTCA AACGAAAAGA AAATGAATTA StGGaS llllo 
AACATCATAC AAAACATACA AATACATATA ATGTCGCCAA ACCTGCACGT GACGACCCTA 17l5o 
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TAACCAATCA AATAAATTTG TTCCATAAAT GGTTAGATAG GCATAGAGAT ATGTGCGAAA 17160 
AGTGGAAAAA TAATCACGAA CGGTTACCCA AATTGAAAGA ATTGTGGGAA AATGAGACAC 17220 
ATAGTGGTGA CATAAATAGT GGTATACCTA GTGGTAACCA TGTGTTGAAT ACTGATGTTT 17280 
CTATTCAAAT AGATATGGAT AATCCTAAAA CAAAGAATGA AATTACGAAT ATGGATACAA 17340 
5 ACCCAGACAA AT CTA CTATG GATACTATAC TGGATGATCT GGAAAAATAT AATGAACCCT 17400 
ACTACTATGA TTTTTATGAA GATGATATCA TCTATCATGA TGTAGATGTT GAAAAATCAT 17460 
CTATGGATGA TATATATGTG GATCATAATA ATGTGACTAA TAATAATATG GATGTACCTA 1752 0 
CTAAAATGCA CAT CG AAATG AATATTGTTA ATAATAAAAA GGAGATTTTC GAAGAGGAAT 17580 
ATCCTATATC AGATATATGG AATATCTAAA ATTAATATAC TTTTTTTGTG TGTGTCATAT 17640 

10 . ATATTTTGTA TTATTTGTAT ATGTTTTTAT TTTATTTATT TATTTATTTA TTTATTGTTT 17700 
TTGGTATATT T GTAAAAAAT ATGTTTTTGT TTATAATCAT ATTATTATAT TTTTAATAAT 17760 
TTG CAACATG ATTT TTTTTT TTCTTTCTTA TTGTGTAATT TTTTTCATAA TATTTATATA 17820 
TATATATGTA TT TTA TTTTT TA GTAT AATA ATTGTATCTA TATTTGATTA ATAATTATGT 17880 
ATATTATGGT TATTTTGTTT CTTTTTCTGT ACATTTTTTC GTAATATATA TATATATATA 17940 

15 TATATATAAT T CTC TTTTTC TAATATATAT ATCCTTCTAT TTTCGATTTT TTCATTTTTT 18000 
TCCAGTATTA ATTTATTTAT TTAT TTGT GA TATTTTATAA TATATTATTT AAATGTGTAT 18060 
TTATATATGT GTTTTATATA TGTGTTTTAT TTTTGTTACT CTAATTCTGA ATAATCCGAG 18120 
CGAAAAAAAA ATATATAATC TCATATAAAA ATTATTTATA ATACAATATT ATATAGTTTC 18180 
CTATTAAAAT AAA TTAATAT AATATACAAT AATATTTCTT GTTATTTTTA TAAATATAAC 1824 0 

20 TAATTTCTTA TTTTTATTTA ACTTTATTCC TTTTTAATTT CTTAATTCTT TTATCAAACA 183 00 
AAAAACATAA AGTAATTCTA CATATCAACA AAAAAAAAAA AAAAAAAAAA AAAAAAAATT 18360 
TATTATAATA TAATAAAAAA TATAAAGACA TAGGTTGAGT TATTATTATA AATGATTTAT 18420 
T ACGATTAA A ACATATTGAG ATTATAATAA TATAATTTAA CATAGAAAGA GTTAAGAATA 18480 
CATTTTTTTT TTTATTTCGA TATGTAATTC AACATATATA TATATATATA TCTTTTTAAT 18540 

25 TTAATT AAAT AAAATTCCTT ATTATTCATA TTGTTTCTTT TATCACATGT GAAATATTAA 186 00 
AAATAATTTT CGATTTTATC GATATATTTA TGTCGTTTAT ATACTTATAT AGGTCTTTAT 18660 
AACTATTGAT TAATAGAAGG TAATAGCCTA ATAATATAAA TACTCGTATT TATAAATTCA 18720 
TTTAT ATA TT TCAAATATAT TTGCATGGTT TATTTTCAAA TACAATTAAT TAGATTTCTT 18780 
AAATATTTCT TCATTTATTC ATTTTTATAG CATATACATG CACATTATAA ATTATTAATA 1884 0 

30 AAAAATTTTT ATTTTAATAT ATAATAACAA TTTTCATACA TTACATTTTT CACACAACAT 18900 
TTAAGTTGTC ATAATGTAAC ACATTAAATA ATATATTACT TATATATATA TAATTATTAA 18960 
TTATATATTA AATAAAAATG TATTATCGCC TGTATTATCA TAGTATATAT AATGTTGTAT 19020 
AACGCTTCAA AATATATATA ATAATATAAT TAAAAATATA TATATAGTAA TTAATTATTT 19080 
TGTTATGTTA TGTAATAATG CAATTAATAT AAGATAAAAT TCAT 19124 

35 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 06 0 amino acids 
40 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



Met 


Val 


Glu 


Leu 


Ala 


Lys 


Met 


Gly 


Pro 


Lys 


Glu 


Ala 


Ala 


Gly 


Gly 


Asp 


1 








5 










10 








15 


Asp 


He 


Glu 


Asp 


Glu 


Ser 


Ala 


Lys 


His 


Met 


Phe 


Asp 


Arg 


He 


Gly 


Lys 








20 










25 










30 


Asp 


Val 


Tyr 


Asp 


Lys 


Val 


Lys 


Glu 


Glu 


Ala 


Lys 


Glu 


Arg 


Gly 


Lys 


Gly 






35 










40 










45 




Leu 


Gin 


Gly 


Arg 


Leu 


Ser 


Glu 


Ala 


Lys 


Phe 


Glu Lys 


Asn 


Glu 


Ser 


Asp 




50 










55 










60 








Pro 


Gin 


Thr 


Pro 


Glu 


Asp 


Pro 


Cys 


Asp 


Leu 


Asp 


His 


Lys 


Tyr 


His 


Thr 


65 










70 










75 






80 


Asn 


Val 


Thr 


Thr 


Asn 
85 


Val 


He 


Asn 


Pro 


Cys 
90 


Ala 


Asp 


Arg 


Ser 


Asp 
95 


Val 


Arg 


Phe 


Ser 


Asp 


Glu 


Tyr 


Gly 


Gly 


Gin 


Cys 


Thr 


His 


Asn 


Arg 


He 


Lys 








100 










105 










110 




Asp 


Ser 


Gin 
115 


Gin 


Gly 


Asp 


Asn 


Lys 
120 


Gly 


Ala 


Cys 


Ala 


Pro 
125 


Tyr 


Arg 


Arg 


Leu 


His 


Val 


Cys 


Asp 


Gin 


Asn 


Leu 


Glu 


Gin 


He 


<31u 


Pro 


He 


Lys 


He 



65 130 135 ' 140 
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Thr Asn Thr His Asn Leu Leu Val Asp Val Cys Met Ala Ala Lys Phe 
Glu Gly Gin Ser lie Thr Gin Asp Tyr Pro tys Tyr Gin Ala Thr Ty2 
Gly Asp Ser Pro Ser Gin He Cys Thr Met Leu Ala Arg Ser Phe Ala 
Asp lie Gly Asp He Val Arg Gly ig Asp Leu Tyr Leu Gly Asn Pro 



10 



20 



30 



200 



Gin Glu He Lys Gin Arg Gin sin Leu Glu Asn Asn Leu Lys Thr lie 



215 



Phe Gly Lys lie Tyr Glu Leu Asn Gly Ala Slu Ala Arg Tyr Gly 

Asn Asp Pro Glu Phe Phe Lys Leu Arg Glu Asp Trp Trp Thr Ala Asn 

Arg Glu Thr Val Trp Lys Ala lie Thr £s Asn Ala Trp Gly A^n Thr 

_ ■ 265 

Arg 

Arg Cys Asn Asp Asp Gin Val Pro Thr Tyr Phe Asp T^ Val Pro Gin 

_ 295 300 

Tyr Leu Arg Trp Phe Glu Glu Trp Ala Glu Asp Phe Cys Arg Lys Lys 
Asn Lys Lys ile Lys Asp Val Lys Arg Asn Cys Arg Gly Lys Asp 

25 ; . Gi U A 9 n T M ^:»:: -,_ ... . _ , 330 . 335 



Tyr Phe His Ala Thr Cys Asn Arg Gl" Glu Arg Thr Lys Gly Tyr Cys 



310 315 320 

Lys 

Lys 

Thr Lys Arg Ala He Gly Lys Leu Arg Tyr Gly Lys Gin £s lie Ser 

360 



Asn Lys Lys lie Lys Asp Val Lys Arg Asn Cys Arg Gly Lys Asp Lys 

Arg 

3 ?° : " " " ~3« ago 



Glu Asp Lys Asp Arg Tyr cys Ser Arg Asn Gly Tyr Asp Cys Glu Lys 



Cys Leu Tyr Ala Cys Asn Pro Tyr Val Asp Trp lie Asn Asn Gin Lys 
Glu Gin Phe Asp Lys Gin Lys Lys Lys Tyr Asp Glu Glu lie Lys Lys 



Tyr Glu Asn Gly Ala Ser Gly Gly Ser Arg Gin Lys Arg Asp Ala Gly 
35 Gly Thr Thr Thr Thr Asn Tyr Asp Gly 1£ Glu Lys Lys Phe }}r Asp 

Glu Leu Asn Lys Ser Glu Tyr Arg IZr Val Asp Lys Phe Leu Glu Lys 
T 440 445 

40 Hi ASn ° 1U G1U Ile ^r Lys Val Lys Asp Glu Glu Gly Gly 

Thr He Asp Phe Lys Asn Val Asn Ser Asp Ser Thr Ser Gly Ala Ser 
Gly Thr Asn Val Glu Ser Gin Gly Thr Phe Tyr Arg Ser Lys Tyr Cys 

45 Gin Pro; cys Pro Tyr Cys Gly Val Lys Lys° Val Asn Asn Gly Gly Ser 

Ser Asn Glu Trp Glu Glu Lys Asn Asn Gly Lys Cys Lys Ser Gly Lys 

50 £0 G1U Pro L ^ s Pro A^p Lys° Glu Gly Thr Thr He Thr lie Leu 

Lys Ser Gly Lys Gly His Asp Asp Ile Glu G*lu Lys° Leu Asn Lys Phe 

550 555 _- n 

Cys Asp Glu Lys Asn Gly Asp Thr lie Asn Ser Gly Gly Ser Gly Thr 

55 Gly Gly Ser Gly Gly Gly Asn Ser Gly A^g Gin Glu Leu Tyr gIu Glu 

„ . 580 585 590 

Trp Lys Cys Tyr Lys Gly Glu Asp Val Val Lys Val Gly His Asp Glu 

60 ASP JS G1U G1U ASP ^ Glu ^sn Val Lys Asn Ala Gly Gly Leu Cys 

OJ - u 615 620 



635 



Glu Lys Glu Pro Asp gIu Ile Gin Lys Thr Phe Asn Pro Phe Phe Tyr 



He Leu Lys Asn Gin Lys Lys Asn Lys Glu Glu Gly Gly Asn Thr Ser 

o^o 630 " — 

Glu Lys Glu Pro Asp Glu 

65 Tyr Trp Val Ala Hif Met Leu Lys Asp Ifr He His Trp Lys Lys Lys 
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660 665 670 

Leu Gin Arg Cys Leu Gin Asn Gly Asn Arg lie Lys Cys Gly Asn Asn 

675 680 685 

Lys Cys Asn Asn Asp Cys Glu Cys Phe Lys Arg Trp lie Thr Gin Lys 
5 690 695 700 

Lys Asp Glu Trp Gly Lys lie Val Gin His Phe Lys Thr Gin Asn lie 
705 710 715 720 

Lys Gly Arg Gly Gly Ser Asp Asn Thr Ala Glu Leu lie Pro Phe Asp 
725 730 735 

10 His Asp Tyr Val Leu Gin Tyr Asn Leu Gin Glu Glu Phe Leu Lys Gly 

740 745 750 

Asp Ser Glu Asp Ala Ser Glu Glu Lys Ser Glu Asn Ser Leu Asp Ala 

755 760 765 

Glu Glu Ala Glu Glu Leu Lys His Leu Arg Glu lie lie Glu Ser Glu 
15 770 775 780 

Asp Asn Asn Gin Glu Ala Ser Val Gly Gly Gly Val Thr Glu Gin Lys 
785 *s 790 795 800 

^ Asn lie Met Asp tf$?s Leu Leu Asn Tyr Glu Lys Asp Glu Ala Asp Leu 
" 805 810 815 

20 Cys Leu Glu lie His Glu Asp Glu Glu Glu Glu Lys Glu Lys Gly Asp 

820 . 825 830 

Gly Asn Glu Cys lie Glu Glu Gly Glu Asn Phe Arg Tvr Asn Pro Cys 

835 840 " 845 

Ser Gly Glu Ser Gly Asn Lys Arg Tyr Pro Val Leu Ala Asn Lys Val 
25 850 855 860 

Ala Tyr Gin Met His His Lys Ala Lys Thr Gin Leu Ala Ser Arg Ala 
865 870 875 880 

Gly Arg Ser Ala Leu Arg Gly Asp lie Ser Leu Ala Gin Phe Lys Asn 
885 890 895 

30 Gly Arg Asn Gly Ser Thr Leu Lys Gly Gin lie Cys Lys lie Asn Glu 

900 905 910 

Asn Tyr Ser Asn Asp Ser Arg Gly Asn Ser Gly Gly Pro Cys Thr Gly 

915 920 925 

Lys Asp Gly Asp His Gly Gly Val Arg Met Arg lie Gly Thr Glu Trp 
35 930 935 940 

Ser Asn lie Glu Gly Lys Lys Gin Thr Ser Tyr Lys Asn Val Phe Leu 
945 950 955 960 

Pro Pro Arg Arg Glu His Met Cys Thr Ser Asn Leu Glu Asn Leu Asp 
965 970 975 

40 Val Gly Ser Val Thr Lys Asn Asp Lys Ala Ser His Ser Leu Leu Gly 

980 985 990 

Asp Val Gin Leu Ala Ala Lys Thr Asp Ala Ala Glu lie lie Lys Arg 

995 1000 1005 

Tyr Lys Asp Gin Asn Asn lie Gin Leu Thr Asp Pro lie Gin Gin Lys 
45 1010 1015 1020 

Asp Gin Glu Ala Met Cys Arg Ala Val Arg Tyr Ser Phe Ala Asp Leu 
1025 1030 1035 1040 

Gly Asp lie lie Arg Gly Arg Asp Met Trp Asp Glu Asp Lys Ser Ser 
1045 1050 1055 

50 Thr Asp Met Glu Thr Arg Leu lie Thr Val Phe Lys Asn lie Lys Glu 

1060 1065 * 1070 

Lys His Asp Gly lie Lys Asp Asn Pro Lys Tyr Thr Gly Asp -Glu Ser 

1075 1080 1085 

Lys Lys Pro Ala Tyr Lys Lys Leu Arg Ala Asp Trp Trp Glu Ala Asn 
55 1 0 9 0 1 0 95 1100 

Arg His Gin Val Trp Arg Ala Met Lys Cys Ala Thr Lys Gly lie lie 
1105 1110 1115 1120 

Cys Pro Gly Met Pro Val Asp Asp Tyr lie Pro Gin Arg Leu Arg Trp 
1125 1130 1135 

60 Met Thr Glu Trp Ala Glu Trp Tyr Cys Lys Ala Gin Ser Gin Glu Tyr 

1140 1145 11S0 

Asp Lys Leu Lys Lys lie Cys Ala Asp Cys Met Ser Lys Gly Asp Gly 

1155 11€0 1165 

Lys Cys Thr Gin Gly Asp Val Asp Cys Gly Lys Cys Lys Ala Ala Cys 
65 1170 1175 1180 
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AspLys Tyr Lys Glu Glu He Glu Lys Trp Asn Glu Gin Trp Arg Lys 



1190 noc 

1195 1200 

Ser 

5 

Gin 

Gin Met Va^Asp Phe Leu Thr *r*>jl?*l. Lys Ala Ser He^Ala A la 



He Ser Asp Lys Tyr^Asn Leu Leu Tyr Leu^Gln Ala Lys Thr Thr Ser 

Leu Gly Asp Asp Asp Pro I 
1225 : 
Pro lie His Lys Ala Ser 1 
" 40 1245 



1205 I5in 

nrto fK l„ St9 ™ L - «51MP *»P Asp Pro ** p £^ ln 

1225 1230 



Alalia Pro lie Thr Pro TyTser Thr Ala Ala G^Tyr He His Gin 



Arg Val^eu Val Lys Arg Alalia Gly Ser Pro Thr GluHe Ala Ala 

Tyr 
Cys i 

Lys His Gly AlaThr'ser Thr Ser Thr^Lys Glu Asn Lys G^uTyr 



1270 



Glu lie Gly Tyr Gly^Iy'cys Gin Glu Gin Thr'cin Phe Cys Glu Lys" 

- 1290 1295 



Thr Phe LyS s Gln Pro Pro Pro Gl^Tyr Ala Thr Ala Cys A^Cys lie 
Asn Arg^er Gin Thr Glu Gl^Pro Lys Lys Lys Glu cfuAsn Val Glu 



SerAla Cys Lys He Val^i^Lys He Leu Glu^I^s Asn Gly Arg 



Thr Thr Val Gly Gl^cys-Asn Pro Lys Glu Kr^r Pro Asp Trp Asp° 

Asn 2 

Gin I 

1395 1400 " ; -■— ll~ 0 s 



-? 65 "70 13 75 

±j Pro ] 

Arg Arg Gin £ys W Leu Cys Leu Tyr Ty^Ile Ala His Glu Ser Gin Thr 



Cys Lys Asn Asn He Asp He Ser His As'p W Ala Cys Met P^rb 

xoou 1385 



° 1U ilia* 1 * ^ ASP t!?, ASn LSU LyS *** Ala Phe lie Lys Thr 



Alalia Ala Glu Thr Phe^eu Ser Trp Gin Tyr Tyr Lys Ser Lys Asn 
Asp Ser Glu Ala Ly^Ile Leu Asp Arg Gly L^Ile Pro Ser Gin Phe ° 
Leu Arg Ser Me^Met Tyr Thr Phe Gly^Tyr Arg Asp He Cys^Leu 
Asn Thr AspHe Ser Lys Lys Gin Asn Asp Val Ala Lys Ala °Lys Asp 

LyS ^Io Gly ^ Phe X| 5 LyS ASP Gly Ser Lys Se^Pro Ser Gly 

Leaser Arg Gin Glu Trp Trp Lys Thr Asn Gly Pr^Glu He Trp Lys 



Gly Met Leu Cys Ala Leu^Thr Lys Tyr Val Th^Asp Thr Asp Asn Lys' 
Arg Lys He Lys ^sn Asp Tyr Ser Tyr ^ Lys Val Asn Gin Se" Gin 
Asn Gly Asr^Pro Ser Leu Glu Gl^Phe Ala Ala Lys Pro G^Phe Leu 
Arg Trp^et He Glu Trp Gly Glu Glu Phe Cys Ala G^u^Arg Gin Lys 



Ly^Glu Asn He He Lysis'pAla Cys Asn Glu He^Asn Ser Thr Gin 
Gin cys Asn Asp Ala Lys His Arg Cys Asn Gl^Ala Cys Arg Ala Tyr° 



i- 605 1610 1615 

Thr A 
) 

Tyr L 

Gly Tyr Glu Tyr Lys Asp Gly VaTem Pro He Gin G^y Asn Glu Tyr 

- ° 3U ... 1655 1660 



Asn Phe Val^euVs Ala Asn ValSfpro Gin Asp Pro C^Tyr Lys 



Gin Glu Tyr Val^Glu Asn Lys Lys Lys^Glu Phe Ser Gly Gin ihr^Asn 

jys 
jys 
Ys 

1670 1675 
Val Leu Ser Val Ser^Pro Lys Glu Lys Pro^Phe Gly Lys Tyr Ala K±l° 

Lys Tyr Pro Glu Lys Cys Asp Cys Tyr .ton°01y Lys His Val P^Ser 



LeuLeu Gin Lys Cys Asp Asn Asn Lys Cys Ser Cys Met Asp Gly Asn 
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1700 1705 1710 

lie Pro Pro Pro Pro Pro Pro Val Gin Pro <31n Pro Glu Ala Pro Thr 

1715 1720 1725 

Val Thr Val Asp Val Cys Ser lie Val Lys Thr Leu Phe Lys Asp Thr 
5 1730 1735 1740 

Asn Asn Phe Ser Asp Ala Cys Gly Leu Lys Tyr Gly Lys Thr Ala Pro 
1745 1750 1755 1760 

Ser Ser Trp Lys Cys lie Pro Ser Asp Thr Lys Ser Gly Ala Gly Ala 
1765 1770 1775 

10 Thr Thr Gly Lys Ser Gly Ser Asp Ser Gly Ser lie Cys lie Pro Pro 

1780 1785 1790 

Arg Arg Arg Arg Leu Tyr Val Gly Lys Leu Gin Glu Trp Ala Thr Ala 

1795 1800 1805 

Leu Pro Gin Gly Glu Gly Ala Ala Pro Ser His Ser Arg Ala Asp Asp 
15 1810 1815 1820 ' 

Leu Arg Asn Ala Phe lie Gin Ser Ala Ala lie Glu Thr Phe Phe Leu 
1825 1830 1835 1840 

Trp Asp Arg Tyr Lys Glu Glu Lys Lys Pro Gin Gly Asp Gly Ser Gin 
1845 1850 1855 

20 Gin Ala Leu Ser Gin Leu Thr Ser Thr Tyr Ser Asp Asp Glu Glu Asp 

1860 1865 1870 

Pro Pro Asp Lys Leu Leu Gin Asn Gly Lys lie Pro Pro Asp Phe Leu 

1875 1880 1885 

Arg Leu Met Phe Tyr Thr Leu Gly Asp Tyr Arg Asp lie Leu Val His 
25 1890 1895 1900 

Gly Gly Asn Thr Ser Asp Ser Gly Asn Thr Asn Gly Ser Asn Asn Asn 
1905 1910 1915 1920 

Asn He Val Leu Glu Ala Ser Gly Asn Lys Glu Asp Met Gin Lys lie 
1925 1930 1935 

30 Gin Glu Lys lie Glu Gin He Leu Pro Lys Asn Gly Gly Thr Pro Leu 

1940 1945 1950 

Val Pro Lys Ser Ser Ala Gin Thr Pro Asp Lys Trp Trp Asn Glu His 

1955 1960 1965 

Ala Glu Ser lie. Trp Lys Gly Met He Cys Ala Leu Thr Tyr Thr Glu 
35 1970 1975 1980 

Lys Asn Pro Asp Thr Ser Ala Arg Gly Asp Glu Asn Lys He Glu Lys 
1985 1990 1995 2000 

Asp Asp Glu Val Tyr Glu Lys Phe Phe Gly Ser Thr Ala Asp Lys His 
2005 2010 2015 

40 Gly Thr Ala Ser Thr Pro Thr Gly Thr Tyr Lys Thr Gin Tyr Asp Tyr 

2020 2025 2030 

Glu Lys Val Lys Leu Glu Asp Thr Ser Gly Ala Lys Thr Pro Ser Ala 

2035 2040 2045 

Ser Ser Asp Thr Pro Leu Leu Ser Asp Phe Val Leu Arg Pro Pro Tyr 
45 2050 2055 2060 

Phe Arg Tyr Leu Glu Glu Trp Gly Gin Asn Phe Cys Lys Lys Arg Lys 
2065 2070 2075 2080 

His Lys Leu Ala Gin He Lys His Glu Cys Lys Val Glu Glu Asn Gly 
2085 2090 2095 

50 Gly Gly Ser Arg Arg Gly Gly He Thr Arg Gin Tyr Ser Gly Asp -Gly 

2100 2105 ' 2110 

Glu Ala Cys Asn Glu Met Leu Pro Lys Asn Asp -Gly Thr Val Pro Asp 

2115 2120 2125 

Leu Glu Lys Pro Ser Cys Ala Lys Pro Cys Ser Ser Tyr Arg Lys Trp 
55 2130 2135 2140 

He Glu Ser Lys Gly Lys Glu Phe Glu Lys Gin Glu Lys Ala Tyr Glu 
2145 2150 2155 21€0 

Gin Gin Lys Asp Lys Cys Val Asn Gly Ser Asn Lys His Asp Asn -Gly 
2165 2170 2175 

60 Phe Cys Glu Thr Leu Thr Thr Ser Ser Lys Ala Lys Asp Phe Leu Lys 

2180 218S 2190 

Thr Leu Gly Pro Cys Lys Pro Asn Asn Val Glu Gly Lys Thr He Phe 

2195 2200 2205 

Asp Asp Asp Lys Thr Phe Lys His Thr Lys Asp Cys Asp Pro Cys Leu 
65 2210 2215 2220 
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Ly^Phe ser Val Asn CysLy 8hy8 Asp Glu Cys Asp Asn Ser Lys Gly 

Thr Asp cys Arg As^Lys Asn Ser He Asp Ala^Thr Asp He Glu Asn° 

2250 



Gly Val Asp Ser Thr Val Leu Glu Met Arg Val Ser Ala Asp s" lys 



Ser Gly Ph^Asn Gly Asp Gly Leu^Glu Asn Ala Cys Arg gT y Ala Gly 



2265 2270 
Glu Asn Ala Cys Arg Gly 1 

He Phe^Glu Gly lie Arg Lys^Glu Trp Lys Cys A^Asn Val Cys 

Gl^xyr Val Val Cys Ly^Pro Glu Asn Val Asn^W^Glu Ala Lys Gly 

Arg Ala Leu Val Lys Arg Trp 
2330 

Lys lie Lys His Lys lie Ser 
Lys Asn Gly Glu He Ser Pro Cys He lys Asn Cys Val GluLys Trp 



Lys His lie He GlnHe Arg Ala Leu Val Lys^Arg Trp Val Glu £yr 

Phe Phe Glu Asp o Tyr Asn Lys He Lys His ^ys He Ser His 

2345 



Val As^Gln Lys Arg Lys Gl^Trp Lys Glu He Thr Glulrg Phe Lys 
Asp^m Tyr Lys Asn Asp^Asn Ser Asp Asp Asp AsTval Arg Ser Phe 
Leu Glu Thr Leu Il^Pro Gin lie. Thr As^Ala^Asn Ala Lys Asn Lys ° 
Val lie Lys Leaser Lys Phe Gly AsnJe^Cys Gly Cys Ser A^a'ser 
Ala Asn Glu^Gln Asn Lys Asn Gly^Tyr Lys Asp Ala H^Asp Cys 
Met Le^Lys Lys Leu Lys Asp^He Gly Glu Cys oiu\ y8 Lys His 
Hi^Gln Thr ser Asp ThrGlu Cys Ser Asp Thr ProVln Pro Gin Thr 

2475 2480 

I 

Lys Asn Met Met^ro Lys He Cys Glu AsTval Leu Lys Thr Al^Gln 



Leu Glu Asp Glu Thr Leu Asp Asp Asp He Glu Thr Glu Glu Ala lys 
2485 2490 



2 - 500 2505 2510 



Gin Glu As^Glu Gly Gly Cys Valero Ala Glu Asn Ser Glu Glu Pro 
Ala Al^Thr Asp Ser Gly Lys^Thr Pro Glu Gin T^Pro Val Leu 



l 515 - 2520 2 5 25 

Lys Pro Glu Glu Glu Ala VaTpro Glu Pro Pro Pro°Pro Pro Pro Gin 



2545 2550 Pro Gin 

Glu Lys Ala Pro Ala^Pro He Pro Gin Pro G^Pro Pro Thr Pro Pro° 

2570 

Thr Gin Leu Le^Asp Asn Pro His Val ^eu Thr Ala Leu Val Thr^ser 

Thr Leu Ala Trp Ser Val Gly lie "y^Phe Ala Thr Phe S°Tyr Phe 

2600 260^ 
Tyr Leu Lys Lys Lys Thr Lys Ser Ser Val Gly Asn Leu Phe Gin He 

2615 2620 
Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr Lys Leu Ser Pro 

Asn Arg Tyr He Pro^Tyr Thr Ser °ly Ly^Tyr Arg Gly Lys Arg Tyr° 

He Tyr Leu Glu Gly Asp Ser Gly Thr As|°ser Gly Tyr Thr A^His 

^ ^bbo 2665 2fi7n 

5l75 Ile ^ 26?0 Ser ° 1U G1U G1U " et As P 

Asn Asp lie Tyr Val Pro Gly l"°Pro Lys Tyr Lys Thr\eu He Glu 

2695 2700 
Valval Leu Glu Pro Ser Gly Asn Asn Thr Thr Ala Ser Gly Asn Asn 



2710 2715 o-7oa 

Thr Thr Ala Ser GlyAsn Asn Thr Thr Ala^Ser Gly Lys Asn Thr III 

Ser Asp Thr Gin Asn Asp He Gin Asn A^pGly He Pro Ser S^rLys 
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2740 2745 2750 

lie Thr Asp Asn Glu Trp Asn Gin Leu Lys Asp Glu Phe lie Ser Gin 

2755 2760 2765 

Tyr Leu Gin Ser Glu Pro Asn Thr Glu Pro Asn Met Leu Gly Tyr Asn 
5 2770 2775 2780 

Val Asp Asn Asn Thr His Pro Thr Thr Ser His His Asn Val Glu Glu 
2785 2790 2795 2800 

Lys Pro Phe lie Met Ser lie His Asp Arg Asn Leu Phe Ser Gly Glu 
2805 2810 2815 

<0 Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn Pro lie Asn lie 

2820 2825 2830 

Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser Asn Asn His Ser 

2835 2840 2845 

Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly lie Asp Leu lie Asn 
15 2850 2855 2860 

Asp Ala Leu Ser Gly Asn His lie Asp lie Tyr Asp Glu Met Leu Lys 
2865 2870 2875 2880 

Arg Lys Glu Ash Glu Leu Phe Gly Thr Lys His His Thr Lys His Thr 
2885 2890 2895 

20 Asn Thr Tyr Asn Val Ala Lys Pro Ala Arg Asp Asp Pro lie Thr Asn 

2900 2905 2910 

Gin lie Asn Leu Phe -His Lys Trp Leu Asp Arg His Ara Asn Met Cvs 

2915 2920 2925 

Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro Lys Leu Lys Glu Leu 
25 2930 2935 2940 

Trp Glu Asn Glu Thr His Ser Gly Asp lie Asn Ser Gly lie Pro Ser 
2945 2950 2955 2960 

Gly Asn His Val Leu Asn Thr Asp Val Ser lie Gin lie Asp Met Asp 
2965 2970 2975 

30 Asn Pro Lys Thr Lys Asn Glu lie Thr Asn Met Asp Thr Asn Pro Asp 

2980 2985 2990 

Lys Ser Thr Met Asp Thr lie Leu Asp Asp Leu Glu Lys Tyr Asn Glu 

2995 3000 3005 

Pro Tyr Tyr Tyr Asp Phe Tyr Glu Asp Asp lie lie Tyr His Asp Val 
35 3010 3015 3020 

Asp Val Glu Lys Ser Ser Met Asp Asp lie Tyr Val Asp His Asn Asn 
3025 3030 3035 3040 

Val Thr Asn Asn Asn Met Asp Val Pro Thr Lys Met His lie Glu Met 
3045 3050 3055 

40 Asn lie Val Asn 

3060 

(2) INFORMATION FOR SEQ ID NO: 15: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7295 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

TCCAAGCTGT TTTTTTTTCT TTTTCTAGTT TTTCCATTGT ATATTCGTCA AATACGTACA 60 

CATATATATA TATATGTATA ACATGTGAGT ATTATTTTAT ACATCACATC GATTACATTT 120 

TAGCGTTTTT TTTCCCCAGA TCACATATAG TACGACTAAG AAACAAAATA ACATCATAAC 180 

60 AAACATAGTG ATTATCAATA CATGATATTA CCACATAATA TAAAGTATTA AATAATATTA 240 

TTGCATGTTA GTGATAACTA CTATATCATA TACACCACTA CTAACTATCA CTACATAGTA 300 

ACAGTAGTAG TCACAATCAT AGCATCATGG TAATATAGAT TTTCATTTCA TATCTTCCTT 3 60 

ATTGTTTGTT ATACATACAC TATTAATATG TATTTATGTT ATAATGGTAG ACTATGTTAA 420 

CAATGTATGA ATGACCATCA TAAATTAATA ACAGACGCAT CAAAACAGTG TATATGTGTG 480 

65 CATTTATGAC ATAATGTAGT CGGGAAGCAT ACAAAAATGG AGCCAGGAGG TAGCGGTGGT 540 
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SI I™i ™E iiS sass a 
5 ss as™ S ESseS ssss- - 

GATATCGTAA GAGGAAAAGA TCTATATcS ^™ GC AGATATAGGT 1140 

iasssss sasss- diE IS tlEE ™^ s; 
» -sssssss sags adsl iiE jest -sssss is.°- 

TTGCAAAAAC AGTGTCGTGA TTACGAACaI ACTAGAAAAT 1560 

TGCACAAAAA CTATATATAA AAAAGGTAAA CTTG^TATAG ^ CTACGAT 1620 

!0 TCTGTTTGGT GTCGTATGTA TraaaXr^^ tn^tZ G GTGAACATTG TACAAACTGT 1680 

AAGTCCTATA . ATGTTTTTTT TTsSSS^ SSJS^" ACAAiCCGrr 2340 

TGGAAGACAG AACTTAG1AA GTctSSK? SJSgSS "CT 13 ™* 0 2400 

AATAATAAAT GTAAAACAGA TTGTGGTTCT TTTCAAAAAT GGGTTGAAJU^ ^ata^ia/*^* 2460 

GAATGGATGG CAATAAAAGA CCATTTT^ri aa^™**™- GGGTTGAAAA AAAACAACAA 2520 
CTTATCGTAT TTAGTCCCTA TGgSES ™£™Si2 ATATTGTCCA ACAAAAAGGT 2580 

■nil 

5SS5SS 3EKSS SSggi 

a™ s — — »~ Ja~ 

£gs»j sssss sekk? gSSgSs aataaaacat :sr 

^ 25552 ^SSSSgS 2SSSS? sssas ssS 

sjsss jssssss sskkj asss is&ssg EEs 

AATAATAATG CTAATCAATT TTCTAGAACA CTAGGAGCGT CcSSiSG? ?g^Sa^?? tilt 
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TTACAAAAGT TAGGATCATG TAAAAATGAT AATGGATATG AGAATGGAGA GGATAATAAA 4500 
ATAGATTTTA AAAATCCAGA TAAAACATTT AAGGAAGCAC ACAGTTGTGA TCCATGTCCT 4560 
ATAACTGGAG TTAAATGTCA AAATGGTCAT TGTGTGGGTT CTGCTAATGG AAAGGAGTGC 462 0 
AAAAACAATA AGATTACTGC AGAAGATATT AAAAATAAGA CAGATCCTAA TGGAAACATA 4680 
5 GAAATGGTTG TCAGTGATGA CAGTACAAAT ACATTTGAAC ATTTAGGCGA TTGTAAAAGC 4740 
TCAGGTATCT TTAAAGGTAT CAGAAAAGAT GAATGGAAAT GCGCTAATGT ATGTGGTGTA 4800 
GATATATGTA CTCTGGAAAA AAAAATTAAG AATGGGCAAG AAGGTGATAA AAAATATATC 4860 
ACAATGAAAG AATTGCTTAA ACGATGGCTA GAATATTTTT TAGAAGATTA TAATAGAATT 4920 
AGAAAAAAAA TAAAGCTATG TACGAAAAAG GAAGATGGAT GCAAATGTAT AAAAGGTTGT 4980 

10 ATAGAAAAAT GGGTACAAGA AAAAACGAAA GAATGGCAAA AAATAAACGA TACTTATCTT 5040 
GAACAATATA AAAATGATGA TGGTAATACT TTAACTAATT TTTTGGAGCA ATTCCAATAT 5100 
CGAACTGAAT TTAAAAACGC TATAAAACCT TGTGATGGTT TAGACCAGTT CAAGACTTCG 5160 
TGTGGTCTTA AT AGTA CTGA TAATTCACAA AATGGTAATA ATAACGATCT TGTTCTATGT 5220 
TTGCTTAATA AACTTCAAAA AAAAATTAGT GAGTGTAAAG AACAACATAG TGGCCAAACC 5280 

15 CAAACACCGT GTGATAACTC TTCCCTTAGT GGTAAAGAAT CCACCCTCGT TGAAGACGTT 5340 
GATGATTATG AGGAACAAAA CCCAGAAAAC AAAGTGGAAC AACCTAAATT TTGTCCAGAT 54 00 
ATGAAAGAAC CAAAAAAAGA AAACGATGAA GAAGTAGGCA CTTGTGGCGG AGACGAAGAA 5460 
AAAAAAAAAG TGGAAGACAG TGTAATCGAA CAAAAAGAGG AAGAAGCAGC TAGTGCCCCA 5520 
GAGGAATCTC CTCCATTAAC CCCGGAAGCA CCAAAAAAAG AGGAAAATGT GGTACCAAAA 5580 

20 CCACCACCAC CACCAAAAAA ACGCCGAATC AAAACCCGTA ATGTGTTGGA CCACCCCGCT 5640 
GTCA TAC CCG CCCTCATGTC TTCTACCATC ATGTGGAGTA TTGGCATCGG TTTTGCTGCG 5700 
TTCACTTATT TTTATCTAAA GAAAAAAAGG AAATCATCTG TTGGAAATTT ATTCCAAATA 5760 
CTGCAAATAC CCAAAAGTGA TTATGATATA CCTACATTGA AATCAAGCAA TCGTTATATA 5820 
CCCTATGCAA GTGATAGACA TAAAGGCAAA ACATATATTT ATATGGAAGG AGATAGCAGT 5880 

25 GGAGATGAAA AATATGCATT TATGTCTGAT ACTACTGATA TAACTTCATC CGAAAGTGAG 5940 
TATGAAGAAT TGGATATTAA TGATATATAT GTACCAGGTA GTCCTAAATA TAAAACATTG 6000 
ATAGAAGTAG TACTTGAACC ATCAAAAAGA GATACACAAA ATGATATACA CAATGATATA 6060 
CCTAGTGATA TACCAAATAG TGACACACCA CCACCCATTA CTGATGATGA ATGGAATCAA 6120 
TTGAAAAAAG ATTTTATATC TAATATGTTA CAAAATACAC AAAATACGGA ACCAAATATT 6180 

30 TTACATGATA ATGTGGATAA TAATACCCAT CCTACCATGT CACGTCATAA TATGGACCAA 624 0 
AAACCT TTT A TTATGTCCAT ACATGATAGA AATTTATTTA GTGGAGAAGA ATACAATTAT 6300 
GATATGTTTA ATAGTGGGAA TAATCCAATA AACATTAGTG ATTCAACAAA TAGTATGGAT 6360 
AGTCTAACAA GTAACAACCA TAGTCCATAT AATGATAAAA ATGATTTATA TAGTGGTATC 642 0 
GACCTAATCA ACGACGCACT AAGTGGTAAT CATATTGATA TATATGATGA AATGCTCAAA 64 80 

35 CGAAAAGAAA ATGAATTATT CGGGACGCAA CATCATCCAA AAAATATAAC GTCTAACCGT 6540 
GTCGTTACCC AAACAAGTAG TGACGACCCT ATAACCAATC AAATAAATTT GTTCCATAAA 6600 
TGGTTAGATA GGCATAGAGA TATGTGCGAA AAGTGGAAAA ATAATCACGA ACGGTTACCC 6660 
AAATTGAAAG AATTGTGGGA AAATGAGACA CATAGTGGTG ACATAAATAG TGGTATACCT 6720 
AGTGGTAACC ATGTGTTGAA TACTGATGTT TCTATTCAAA TAGATATGGA TAATCGGAAA 678 0 

40 ACAATGAATG AATTTACTAA TATGGATACA AACCCCGACA AATCTACTAT GGATACTATA 684 0 
TTGGATGATC TAGAAAAATA TAACGAACCC TACTACTATG ATTTTTATAA ACATGATATC 6 900 
TATTATGATG TT^AATGATGA TAAAGCATCT GAGGATCATA TAAATATGGA TCATAATAAG 6960 
ATGGATAATA ATAATTCGGA TGTCCCCACT AACGTACAAA TTGAAATGAA TGTCATTAAT 702 0 
AATCAGGAGT TACTACAAAA TGAATATCCT ATATCGCATA TGTAGGGAAT ATGAAAATAA 7080 

45 TAGATGTATA TATGTTTTTT TCTTTTTTTG TGTGTGTGCA GTTTATATTT TTTATTTGTA 714 0 
GATGTTATAT ATTTTTTTTA TTTGTGGGTT ATATTATAAT TTTTATTTAT -GGGTTATATA 7200 
TATATTTTTT TTTTTGTGCA TTTGTCTATT TTTTATTTGT GCTTTATATA TATATATATT 726 0 
TTATTCAGCT TGGACTTAAC CAGGCTGAAC TTGCT 7295 

50 (2) INFORMATION FOR SEQ ID NO: 16: 



55 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2182 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: protein 



60 



(iii) HYPOTHETICAL: NO 



(iv) ANTI- SENSE: NO 



(v) FRAGMENT TYPE: N- terminal 



PCT/US96/09508 

•70- 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Glu Pro Gly Gly Ser Gly Gly Arg Gly Ser Gly Gly Ser Ser Ser 
Gly Lys Gly Lys Lys Asp Thr Ser Glu T^r lie Tyr Val Ser A^p Ala 
Lys Asp Leu Leu Asp Arg Val Gly oL Lys Val Tyr Glu s'lu Lys Val 
Lys Asn Gly Asp Ala Lys Lys Tyr lie Glu Ala Leu Lys Gly Asn Leu 
Asn Thr Ala Asn Gly Arg Ser Ser Glu Thr Ala S^r Ser He Glu Thr 
Cys Thr Leu Val Lys Glu Tyr Tyr Glu Arg vL Asn Gly Asp Gly Lys 
Arg His Pro Cys Arg Lys Asp Ala Lys A^n Glu Asp Val Asn A^rg Phe 
Ser Asp Thr Leu Gly Gly Gin Cys Thr Tyr Asn Arg He Ly" Asp Ser 

Gin Gin Gly Asp Asn Lys Val Gly Ala Cys Ala Pro Arg Arg Leu 

135 * 140 



His Leu Cys Asp Tyr Asn Leu Glu Ser He Asp Thr Thr Ser Thr Thr 
His Lys Leu Leu Leu Glu Val Cys Met Ala HI Lys Tyr Glu Gly Hn 
Ser lie Asn Thr His Tyr Thr Gin Hi. Gin Arg Thr Asn Glu Zip Ser 
Ala ser Gin Leu Cys Thr Val Leu Ala" Arg Ser Phe- Ala A^p He Gly 
Asp lie Val Arg Gly Lys Asp Leu Tyr Leu Gly Tyr A^p Asn Lys Glu 
Lys Glu Gin Arg Lys Lys Leu Glu Gin Lys Leu Lys* Asp He Phe Lys 
Lys lie His Lys Asp Val Met Lys Thr Asn Gly Ala Gin Glu Arg Tyr 
He Asp Asp Ala Lys; Gly Gly Asp Phe lhe° Gin Leu Arg Glu Asp Trp 
Trp Thr Ser Asn Arg Glu Thr Val Trp" Lys Ala Leu lie His Ala 

Pro Lys Glu Ala Asn Tyr Phe He Lys Thr Ala Cys A^n Val Gly Lys 
Gly Thr Asn Gly Gin Cys His Cys lie Gly Gly a'sp Val Pro Thr Tyr 
Phe Asp Tyr Val Pro Gin Tyr Leu Arg Trp Pne Glu Glu Trp Ala Su 
Asp Phe Cys Arg Lys Lys Lys Lys Lys Leu Glu Asn Leu Gin ly! Gin 
Cys Arg Asp Tyr Glu Gin Asn Leu %£. Cys Ser Gly Asn Gly Tyr Asp 

Gly Lys 
Arg Met 

Asn Gin Lys Lys Glu Phe^ Leu L ys Gin Lys Arg Lys Tyr Glu Thr G Ju 



Cys Thr Asn Cys Ser Val Trp Cys Arg Met Tyr Gl^ Thr Trp He Asp 



He Ser Gly Gly Gly Ser Gly Lys Ser Pro Lys Arg Thr Lys Ala 
Ala Arg s er S er Ser Ser Ser Asp Asp Asn Gly Tyr Glu S^r Lys Phe 

_ ^ 440 44c 

Tyr Lys Lys Leu Lys Glu Val Gly Tyr Gin Asp Val Asp Lys Phe Leu 

Lys lie Leu Asn Lys Glu Gly He Cys Gin Lys Gl" Pro Gin Val Gly 

Asn Glu Lys Ala Asp Asn Val Asp Phe Thr Isn Glu Lys Tyr Val Lys 

490 . » c 

Thr Phe Ser Arg Thr Glu He Cys Glu Pro Cys Pro Trp Cys Gly Leu 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 







500 








505 








510 






Glu 


Lys 


Gly Gly Pro 


Pro 


Trp 


Lys 


Val 


Lys 


Gly Asp 


Lys 


Thr 


Cys 


Gly 






515 






520 








525 






Ser 


Ala 


Lys Thr Lys 


Thr 


Tyr 


Asp 


Pro 


Lys 


Asn He 


Thr 


Asp 


He 


Pro 




530 






535 








540 








Val 


Leu 


Tyr Pro Asp 


Lys 


Ser 


Gin 


Gin 


Asn 


He Leu 


Lys 


Lys 


Tyr 


Lys 


545 






550 










555 






560 


Asn 


Phe 


Cys Glu Lys 


Gly Ala 


Pro 


Gly 


Gly Gly Gin 


He 


Lys 


Lys 


Trp 






565 










570 








575 




Gin 


Cys 


Tyr Tyr Asp 


Glu 


His 


Arg 


Pro 


Ser 


Ser Lys 


Asn 


Asn 


Asn 


Asn 






580 








585 








590 






Cys 


Val 


Glu Gly Thr 


Trp Asp 


Lys 


Phe 


Thr 


Gin Gly 


Lys 


Gin 


Thr 


Val 






595 






600 








605 








Lys 


Ser 


Tyr Asn Val 


Phe 


Phe 


Trp 


Asp 


Trp 


Val His 


Asp 


Met 


Leu 


His 




610 






615 








620 










Asp 


Ser Val Glu Trp 


Lys 


Thr 


Glu 


Leu 


Ser 


Lys Cys 


He 


Asn 


Asn 


Asn 


625 






630 










635 








640 


Thr 


Asn Gly Asn Thr 


Cys 


Arg 


Asn 


Asn 


Asn 


Lys Cys 


Lys 


Thr 


Asp 


Cys 






645 










650 








655 




Gly 


Cys 


Phe Gin Lys 


Trp 


Val 


Glu 


Lys 


Lys 


Gin Gin 


Glu 


Trp 


Met 


Ala 






660 








665 








670 






Tl ~ 

A ^ X— 


J 


Asp His Phe 


Gly 


Lys 


Gin 


Thr 


Asp 


He Val 


Gin 


Gin 


Lys 


Gly 






675 






680 








685 




Leu 


lie 


Val Phe. Ser 


Pro 


Tyr 


Gly 


Val 


Leu 


Asp Leu 


Val 


Leu 


Lys 


Gly 




690 






695 








700 








Gly 


Asn 


Leu Leu Gin 


Asn 


He 


Lys 


Asp 


Val 


His Gly 


Asp 


Thr 


Asp 


Asp 


705 






710 










715 








720 


He 


Lys 


His He Lys 


Lys 


Leu 


Leu 


Asp 


Glu Glu Asp 


Ala 


Val 


Ala 


Val 






725 










730 








735 




Val 


Leu 


Glv Glv Lvs 


Asp 


Asn 


Thr 


Thr 


He 


Asp Lys 


Leii 


Leu 


Gin 


His 






740 








745 








750 






Glu 


Lys 


Glu Gin Ala 


Glu 


Gin 


Cys 


Lys 


Gin 


Lys Gin 


Glu 


Glu 


Cys 


Glu 






755 






760 






765 






Lys 


Lys 


Ala Gin Gin 


Glu 


Ser Arg 


Gly 


Arg 


Ser Ala 


Glu 


Thr 


Arg 


Glu 




770 






775 








780 








Asp 


Glu 


Arg Thr Gin 


Gin 


Pro 


Ala 


Asp 


Ser 


Ala Gly 


Glu 


Val 


Glu 


Glu 


785 






790 










795 








800 


Glu 


Glu 


Asp Asp Asp 


Asp 


Tyr Asp 


Glu 


Asp 


Asp Glu 


Asp 


Asp 


Asp 


Val 






805 










810 








815 




Val 


Gin 


Glu Glu Glu 


Glu 


Gly Lys 


Glu 


Glu -Gly Thr 


Val 


Thr 


Glu 


Val 






820 








825 








830 






Thr 


Glu 


Val Thr Glu 


Val 


Val 


Glu 


Glu 


Thr 


Val Thr 


Glu 


Gin 


Glu 


Gly 






835 






840 








845 






Val 


Lys 


Pro Cys Asp 


He 


Val 


Gly 


Lys 


Leu 


Phe Glu 


Asp 


Asp 


Lys 


Ser 




850 






855 








860 










Leu 


Lys 


Glu Ala Cys 


Gly 


Leu 


Lys 


Tyr 


Gly 


Pro Gly 


Gly 


Lys 


Glu 


Lys 


865 






870 










875 








880 


Phe 


Pro 


Asn Trp Lys 


Cys 


Val 


Thr 


Pro 


Ser Gly Val 


Ser 


Thr 


Ala 


Thr 






885 










890 








895 




Ser 


Gly 


Lys Asp Gly 


Ala 


He 


Cys 


Val 


Pro 


Pro Arg. 


Arg 


Arg 


Arg 


Leu 






900 








905 








910 






Tyr 


Val 


Gly Gly Leu 


Ser 


Gin 


Trp 


Ala 


Ser Arg Gly 


Gly 


Asp 


Glu 


Thr 






915 






920 








925 








Thr 


Glu 


Val Ser Ser 


Glu 


Ala 


Thr 


Ser 


Ala 


Pro Ser 


Gin 


Ser 


Glu 


Ser 




93 0 






935 








940 










Glu 


Lys 


Leu Arg Thr 


Ala 


Phe 


He 


Glu 


Ser 


Ala Ala 


He 


<31u 


Thr 


Phe 


945 






950 










955 








960 


Phe 


Leu 


Trp His Lys 


Tyr 


Lys 


Glu 


Glu 


Lys 


Lys Pro 


Pro 


Ala 


Thr 


Gin 






965 










970 








975 




Asp 


Gly Ala Gly Leu 


Gly Val 


Ser 


Leu 


Pro 


Glu Pro 


Ser 


Pro 


Pro 


Gly 






980 








985 








990 




Glu 


Asp 


Pro Gin Thr 


Gin 


Leu 


Gin 


Gin 


Thr Gly Val 


He 


Pro 


Pro 


Asp 






995 






1000 






1005 




Phe 


Leu 


Arg Gin Met 


Phe 


Tyr 


Thr 


Leu 


Ala 


Asp Tyr 


Lys 


Asp 


He 


Leu 




1010 




1015 






1020 
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Tyr Ser Gly Ser Asn Asp Thr Ser Asp Thr Thr Gly Lys Gin Thr Pro 

e 025 e „ 1030 1035 1040 

Ser Ser Ser Asn Asp Asn Leu Lys Asn lie Val L u Glu Ala Ser Gly 
c 1045 1050 1055 

a Ser Thr Glu Gin Glu Lys Glu Lys Met Lys Gin He Gin Ala Lys lie 

1060 1065 1070 

Lys Lys lie Leu Asn Gly Ala Thr Ser Gly Val Pro Pro Val Thr Lvs 
1075 1080 1085 

in ' ASn ?««„ Val LyS Thr Pro Gln Gln Thr Trp Trp Glu Asn He Ala Lys 
10 1090 109S 1100 

Asp He Trp Asn Ala Met Val Cys Ala Leu Thr Tyr Lys Glu Asn Asp 
H05 1110 1H5 1120 

Ala Arg Gly Thr Ser Ala Lys lie Glu Gln Asn Lys Asp Leu Lys Lys 

H25 1130 1135 

Ala Leu Trp Asp Glu Ala Asn Lys Asn Thr Pro lie Glu Lys Tyr Gln 

H40 1145 1150 

Tyr Thr Asn Val Lys Leu Glu Asp Glu Ser Gly Ala Lys Ser Asn Asp 

1155 1160 1165 

Thr He Gln Pro Pro Thr Leu Lys Asn Phe Val Glu He Pro Thr Phe 
20 1170 1175 H80 

Phe Arg Trp Leu His Glu Trp Gly Asn Ser Phe Cys Phe Glu Arc Ala 
ll 85 H90 1195 1200 

Lys Arg Leu Ala Gln He Lys His Glu Cys Met Asp Glu Asp Gly Glu 

1205 1210 1215 

Lys Gln Tyr Ser Gly Asp Gly Glu Tyr Cys Glu Glu lie Phe Ser Lys 
„_ 1220 1225 1230 

Gln Tyr Asn Val Leu Gln Asp Leu Ser Ser Ser Cys Ala Lys Pro Cvs 

1235 1240 1245 

Arg Leu Tyr Lys Thr Trp lie Glu Lys Lys Lys Thr Glu Tyr Glu Lys 
JU 1250 1255 1260 

Gln Gln Lys Ala Tyr Glu Gln Gln Lys Ser Asn Tyr Glu Asn Glu Gln 
?- 265 ' 1270 1275 1280 

Lys Asp Lys Cys Gln Thr Gln Ser Asn Asn Asn Ala Asn Glu Phe Ser 
ol - 1285 1290 1295 

3b Arg Thr Leu Gly Ala Ser Pro Thr Ala Ala Glu Phe Leu Gln Lys Leu 

1300 1305 1310 

Gly Ser Cys Lys Asn Asp Asn Gly Tyr Glu Asn Gly Glu Asp Asn Lys 
1315 1320 1325 

ah Ile Asp Phe Lys Asn Pro As P Thr phe Glu Ala His Ser Cys 

40 1330 1335 1340 

Asp Pro Cys Pro Ile Thr Gly Val Lys Cys Gln Asn Gly His Cys Val 
i345 1350 1355 1360 

Gly Ser Ala Asn Gly Lys Glu Cys Lys Asn Asn Lys lie Thr Ala Glu 
1365 1370 1375 

4S Asp H e Lys Asn Lys Thr Asp Pro Asn Gly Asn He Glu Met Val Val 

1380 1385 1390 

Ser A sp Asp Ser Thr Asn Thr Phe Glu His Leu Gly Asp Cys Lys Ser 

1395 1400 1405 

Ser Gly Ile Phe Lys Gly Ile Arg Lys Asp Glu Trp Lys Cys Ala Asn 
50 1410 1415 1420 

Val Cys Gly Val Asp lie Cys Thr Leu Glu Lys Lys lie Lys Asn Gly 
I 425 1430 1435 1440 

Gln Glu Gly Asp Lys Lys Tyr He Thr Met Lys Glu Leu Leu Lys Arg 

1445 1450 1455 

Trp Leu Glu Tyr Phe Leu Glu Asp Tyr Asn Arg He Arg Lys Lys Ile 

1460 1465 1470 

Lys Leu Cys Thr Lys Lys Glu Asp Gly Cys Lys Cys Ile Lys Gly Cys 

1 4 75 1480 1485 

Ile Glu Lys Trp Val Gln Glu Lys Thr Lys Glu Trp Gln Lys He Asn 

1490 1495 1500 

Asp Thr Tyr Leu Glu Gln Tyr Lys Asn Asp Asp Gly Asn Thr Leu Thr 
1505 1510 1515 1520 

Asn Phe Leu Glu Gln Phe Gln Tyr Arg Thr Glu Phe Lys Asn Ala Ile 

1525 1530 1535 

Lys Pro Cys Asp Gly Leu Asp Gln Phe Lys Thr Ser Cys Gly Leu Asn 



55 



65 
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1540 1545 1550 

Ser Thr Asp Asn Ser Gin Asn Gly Asn Asn Asn Asp Leu Val Leu Cys 

1555 1560 1565 

Leu Leu Asn Lys Leu Gin Lys Lys lie Ser Glu Cys Lys Glu Gin His 

1570 1575 1580 

Ser Gly Gin Thr Gin Thr Pro Cys Asp Asn Ser Ser Leu Ser Gly Lys 
1585 1590 1595 1600 

Glu Ser Thr Leu Val Glu Asp Val Asp Asp Tyr Glu Glu Gin Asn Pro 

1605 1610 1615 

Glu Asn Lys Val Glu Gin Pro Lys Phe Cys Pro Asp Met Lys Glu Pro 

1620 1625 1630 

Lys Lys Glu Asn Asp Glu Glu Val Gly Thr Cys Gly Gly Asp Glu Glu 

1635 1640 1645 

Lys Lys Lys Val Glu Asp Ser Val He Glu Gin Lys Glu Glu Glu Ala 

1650 1655 1660 

Ala Ser Ala Pro Glu Glu Ser Pro Pro Leu Thr Pro Glu Ala Pro Lys 
1665 1670 1675 1680 

Lys Glu Glu Asn Val Val Pro Lys Pro Pro Pro Pro Pro Lys Lys Arg 

1685 1690 1695 

Arg He Lys Thr Arg Asn Val Leu Asp His Pro Ala Val He Pro Ala 

1700 1705 1710 

Leu Met Ser Ser Thr lie Met Trp Ser lie Gly lie Gly Phe Ala Ala 

1715 1720 1725 

Phe Thr Tyr Phe Tyr Leu Lys Lys Lys Thr Lys Ser Ser Val Gly Asn 

1730 1735 1740 

Leu Phe Gin He Leu Gin He Pro Lys Ser Asp Tyr Asp He Pro Thr 
1745 1750 1755 1760 

Leu Lys Ser Ser Asn Arg Tyr He Pro Tyr Ala Ser Asp Arg His Lys 

1765 1770 1775 

Gly Lys Thr Tyr He Tyr Met Glu Gly Asp Ser Ser Gly Asp Glu Lys 

1780 1785 1790 

Tyr Ala Phe Met Ser Asp Thr Thr Asp He Thr Ser Ser Glu Ser Glu 

1795 1800 1805 

Tyr Glu Glu Leu Asp He Asn Asp He Tyr Val Pro Gly Ser Pro Lys 

1810 1815 1820 

Tyr Lys Thr Leu He Glu Val Val Leu Glu Pro Ser Lys Arg Asp Thr 
1825 1830 1835 1840 

Gin Asn Asp He His Asn Asp He Pro Ser Asp lie Pro Asn Ser Asp 

1845 1850 1855 

Thr Pro Pro Pro He Thr Asp Asp Glu Trp Asn Gin Leu Lys Lys Asp 

I860 1865 1870 

Phe He Ser Asn Met Leu Gin Asn Thr Gin Asn Thr Glu Pro Asn He 

1875 1880 1885 

Leu His Asp Asn Val Asp Asn Asn Thr His Pro Thr Met Ser Arg His 

1890 1895 1900 

Asn Met Asp Gin Lys Pro Phe He Met Ser He His Asp Arg Asn Leu 
1905 1910 1915 1920 

Phe Ser Gly Glu Glu Tyr Asn Tyr Asp Met Phe Asn Ser Gly Asn Asn 

1925 1930 1935. 

Pro He Asn He Ser Asp Ser Thr Asn Ser Met Asp Ser Leu Thr Ser 

1940 1945 * 1950 

Asn Asn His Ser Pro Tyr Asn Asp Lys Asn Asp Leu Tyr Ser Gly He 

1955 1960 1965 

Asp Leu He Asn Asp Ala Leu Ser Gly Asn His He Asp He Tyr Asp 

1970 1975 1980 

Glu Met Leu Lys Arg Lys Glu Asn Glu Leu Phe Gly Thr Gin His His 
1985 1990 1995 2000 

Pro Lys Asn He Thr Ser Asn Arg Val Val Thr Gin Thr Ser Ser Asp 

2005 2010 2015 

Asp Pro He Thr Asn Gin He Asn Leu Phe His Lys Trp Leu Asp Arg 

2020 2025 2030 

His Arg Asp Met Cys Glu Lys Trp Lys Asn Asn His Glu Arg Leu Pro 

2035 2040 2045 

Lys Leu Lys Glu Leu Trp Glu Asn Glu Thr His Ser Gly Asp He Asn 
2050 2055 2060 
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?nL Gly 116 Pr ° S6r Gly Asn His Val Leu Asn Th * Asp Val Ser He 

? 2070 2075 2080 

Gin He Asp Met Asp Asn Pro Lys Thr Met Asn Glu Phe Thr Asn Met 
„ ml _ 2085 2090 2095 

Asp Thr Asn Pro Asp Lys Ser Thr Met Asp Thr He Leu Asp Asp Leu 

2100 2105 2110 

Glu Lys Tyr Asn Glu Pro Tyr Tyr Tyr Asp Phe Tyr Lys His Asp He 

2115 2120 2125 

Tyr T£f n As P Val Asn As P Asp Lys Ala Ser Glu Asp His He Asn Met 

. JO 2135 2140 

A iL HlS ASn LyS Met Asp Asn A sn Asn Ser Asp Val Pro Thr Asn Val 

2150 2155 2ifin 

Gin lie Glu Met Asn Val He Asn Asn Gin Glu Leu Leu Gin Asn Glu 

2165 2170 917 c 

Tyr Pro He Ser His Met 
2180 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE : 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 : 
ATCGATCAGC TGGGAAGAAA TACTTCATCT 30 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
ATCGATGGGC CCCGAAGTTT GTTCATTATT 30 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 
(V) FRAGMENT TYPE: 
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(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

5 TCTCGTCAGC TGACGATCTC TAGTGCTATT 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

15 (ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
ACGAGTGGGC CCTGTCACAA CTTCCTGAGT 

(2) INFORMATION FOR SEQ ID NO : 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
AGACCTCAAT TTCTAAG 

(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
AATCGCGAGC ATCATCTG 

60 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
65 (B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
5 (iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

( v) FRAGMENT TYPE : 

(vi) ORIGINAL SOURCE: 

10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 : 

CCRAGRAGRC AARAAYTATG 20 
(2) INFORMATION FOR SEQ ID NO: 24: 

15 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE : cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 
25 (V) FRAGMENT TYPE: 

(Vi) ORIGINAL SOURCE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

30 CCAWCKKARR AATTGWGG 18 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 291 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

45 



50 



55 



60 



65 





(xi) SEQUENCE 


DESCRIPTION 


: SEQ ID 


NO: 25 : 










Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Val 


Cys 


lie 


Pro 


Asp 


Arg 


Arg 


Tyr 


Gin 


Leu 


Cys 


Met 


Lys 








20 










25 










30 




Glu 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




50 










55 










60 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
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70 










75 










80 


Xaa 


Asp 


Phe 


Cys 


Lys 


Asp 


lie 


Arg 


Trp 


Ser 


Leu 


Gly Asp 


Phe 


Gly Asp 










85 










90 










95 




He 


He 


Met 


Gly 


Thr 


Asp 


Met 


Glu 


Gly 


He 


Gly 


Tyr 


Ser 


Lys 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Thr 


Asp 


Glu 


Lys 


Ala 


Gin 


Gin 






115 










120 










125 








Arg 


Arg 


Lys 


Gin 


Trp 


Trp 


Asn 


Glu 


Ser 


Lys 


Ala 


Gin 


lie 


Trp 


Thr 


Ala 




130 










135 










140 
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Met 


Met 


Tyr 


Ser 


Val 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 










155 










160 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


Pro 


Gin 


He 


Tyr 


Arg 


Trp 










165 










170 










175 


lie 


Arg 


Glu 


Trp 


Gly Arg Asp 


Tyr 


Val 


Ser 


Glu 


Leu 


Pro 


Thr 


Glu 


Val 








180 










185 










190 






Gin 


Lys 


Leu 
195 


Lys 


Glu 


Lys 


Cys 


Xaa 
200 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
205 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Val 


Pro 


Pro 


Cys 


Gin 


Asn 


Ala 


Cys 


Lys 


Ser 


Tyr 


Asp 




210 










215 










220 








Gin 


Trp 


He 


Thr 


Arg 


Lys 


Lys 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


225 










230 










235 










240 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
245 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
250 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
255 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
260 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
265 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
270 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
275 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
280 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
285 


Xaa 


Xaa 


Xaa 



Cys Xaa Cys 
290 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Cys 


He 


Pro 


Asp 


Arg 


Arg 


He 


Gin 


Leu 


Cys 








20 










25 










30 




He 


Val 


Asn 
35 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 
40 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
45 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
50 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
55 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
60 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Phe 


Cys 


Asn 


Asp 


Leu 


Lys 


Asn 


65 










70 










75 






80 


Ser 


Phe 


Leu 


Asp 


Tyr 
85 


Gly 


His 


Leu 


Ala 


Met 
90 


Gly 


Asn 


Asp 


Met 


Asp 
95 


Phe 


Gly Gly 


Tyr 


Ser 


Thr 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ser 


Glu 


His 


Lys 


He 


Lys 


Asn 


Phe 


Arg 


Lys 






115 










120 










125 




Glu 


Trp 
130 


Trp 


Asn 


Glu 


Phe 


Arg 
135 


Glu 


Lys 


Leu 


Trp 


Glu 
140 


Ala 


Met 


Leu 


Ser 


Glu 


His 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Glu 


145 










150 










155 










reo 


Leu 


Gin 


He 


Thr 


Gin 
165 


Trp 


He 


Lys 


Glu 


Trp 
170 


His 


Gly 


Glu 


Phe 


Leu 
175 


Leu 


Glu 


Arg 


Asp 


Asn 
180 


Arg 


Ser 


Lys 


Leu 


Pro 
185 


Lys 


Ser 


Lys 


Cys 


Xaa 
190 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
195 


Xaa 


Xaa 


Cys 


Xaa 


Glu 
200 


Lys 


Glu 


Cys 


He 


Asp 
205 


Pro 


Cys 


Met 


Lys 


Tyr 


Arg 


Asp 


Trp 


He 


He 


Arg 


Ser 


Lys 


Phe 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 



210 215 220 
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Xaa Xaa xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

230 235 
Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa X^a 

245 250 255 

Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Cys 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa 

1 5 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Val Cys Val Pro Pro Arq Aro 

20 25 30 

Gin Glu Leu Cys Leu Gly Asn lie Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 60 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Glu Val Cys Lys 

70 75 qq 

He lie Asn Lys Thr Phe Ala Asp He Arg Asp lie He Gly Gly Thr 

Asp Tyr Trp Asn Asp Leu Ser Asn Arg Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

100 105 no 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asn Lys Lys Asn Asp Lys Leu Phe 

115 120 125 

Arg Asp Glu Trp Trp Lys Val He Lys Lys Asp Val Trp Asn Val He 

Ser Trp Phe Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

150 155 160 

He Pro Gin Phe Phe Arg Trp Phe Ser Glu Trp Gly Asp Asp Tyr Cys 

* 165 170 175 

Gin Asp Lys Thr Lys Met lie Glu Thr Leu Lys Val Glu Cys Xaa Xaa 

180 185 . i9o 

Xaa Xaa Cys Xaa Asp Asp Asn Cys Lys Ser Lys Cys Asn Ser Tyr Lvs 

1^5 200 205 

Glu Trp He Ser Lys Lys Lys Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

2X0 215 2'20 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
225 230 235 240 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa 

245 250- 255 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Cys Xaa Xaa Cys 
275 

(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 282 amino acids 

(B) TYPE: amino acid 



WO 96/40766 



PCT/US96/09508 



-79- 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL : NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi ) ORIGINAL SOURCE : 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 



Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


1 








5 










10 








15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Cys 


Gly Pro 


Pro 


Arg 


Arg 








20 










25 










30 


Gin 


Gin 


Leu 


Cys 


Leu 


Gly 


Tyr 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




50 










55 










60 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lvs 


lie 


Cys 


Asn 


65 










70 










75 






80 


Ala 


lie 


Leu 


Gly 


Ser 


Tyr 


Ala 


Asp 


lie 


Gly Asp 


lie 


Val 


Arg 


Gly 


Leu 










85 










90 










95 




Asp 


Val 


Trp 


Arg 


Asp 


He 


Asn 


Thr 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Lys 


Gin 


Asn 


Asp 


Asn 






115 










120 










125 






Asn 


Glu 


Arg 


Asn 


Lys 


Trp 


Trp 


Glu 


Lys 


Gin Arg Asn 


Leu 


He 


Trp 


Ser 




130 










135 










140 








Ser 


Met 


Val 


Lys 


His 


He 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 










155 








160 


Xaa 


Xaa 


Xaa 


Xaa 


lie 


Pro 


Gin 


Phe 


Leu 


Arg 


Trp 


Leu 


Lys 


Glu 


Trp 


Gly 










165 










170 










175 


Asp 


Glu 


Phe 


Cys 


Glu 


Glu 


Met 


Gly 


Thr 


Glu 


Val 


Lys 


Gin 


Leu 


Glu 


Lys 








180 










185 








190 




He 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Glu 


Lys Lys 


Cys 


Lys 


Asn 


Ala 


Cys 






195 










200 










205 






Ser 


Ser 


Tyr 


Glu 


Lys 


Trp 


He 


Lys 


Glu 


Arg 


Lys 


Asn 


Xaa 


Xaa 


Xaa 


Xaa 




210 










215 






220 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


225 










230 










235 










240 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










245 










250 










255 




Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










270 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 















275 280 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 324 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 
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1 








5 










10 










15 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ala 


Cys 


lie 


Pro 


Pro 


Arg Arg 


Gin 


Lvs 








20 










25 










30 




Leu 


Cys 


Leu 


His 


Tyr 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




Xaa 

Au Ck, 




50 










55 










60 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


65 










70 










75 










An 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Asp 


Phe 


Lys 


Arg 


Gin 


Met 


Phe 










85 










90 






95 




Tyr 


Thr 


Phe 


Ala 


Asp 


Tyr Arg Asp 


lie 


Cvs 


Leu 


Gly Thr Asp 


lie 


Ser 








100 










105 










110 






Ser 


Lys 


Lys 


Asp 


Thr 


OCJL 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






115 










120 










125 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


T .vrc 


lie 


Ser 


Asn 


Ser 


lie 


Arg 


Tyr Arg 


JJJr 0 • 


OCX . 




130 










135 










140 








Trp 


Trp 


Glu 


Thr 


Asn 


v7X jr. 


Pro 


Val 


lie 


Tro 


Glu 


Gly Met 


Leu 


Cys 


Ala 


145 










13U 










^ w j 








1DU 


Leu 


Xaa 


Xaa 


Xaa 


Xaa 


Ana 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Va=a" 
■AgIgI 












165 










170 










± I p 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








180 










185 










190 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Arg 


Pro 


Gin 










195 










200 








205 








Ara 


Trp 


Leu 


Thr 


Glu 


Trp 


Gly Glu Asn 


Phe 


Cys 


Lys 


Glu 


Gin 


T ,ve 


T,ve 
Iiyo 




210 










215 










220 






Glu 


Tyr 


Lys 


Val 


Leu 


Leu 


Ala 


Lys 


Cys 


Xaa Xaa 


Xaa 


Xaa 


Xaa 


Yaa 

ACld , 


Acta 


225 










230 






235 












Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Cys 


Val 


Ala 


Cys 


Lys 


Asp 




V* Jf & 










245 










250 








one 


Lys 


Gin 


Tyr 


His 


Ser 


Trp 


lie 


Gly 


lie 


Trp 


He 


Asp 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 








270 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






275 










280 










285 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




290 










295 










300 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


305 










310 










315 










32 0 


Xaa 


Xaa 


Xaa 


Cys 



























10 



15 



20 



25 



30 



35 



40 

(2) INFORMATION FOR SEQ ID NO: 30: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 362 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

50 

(ii) MOLECULE TYPE: peptide ' 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: internal 
55 (vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Ala Cys Ala Pro Tyr Arg Arg Leu His Leu Cys Asp Tyr Asn Leu Xaa 

60 1 5 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

35 40 45 

65 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin Leu Cys Thr Val Leu 



BNSDOCID: <WO 9640766A2J_> 
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50 55 60 



Ala 


Arg 


Ser 


Phe 


Ala 


Asp 


He 


Gly Asp 


He 


Val 


Arg 


Gly 


Lys 


Asp 


Leu 


65 










70 










75 








80 


Tyr 


Leu 


Gly Tyr 


Asp 


Asn 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










85 










90 










95 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








100 










105 










110 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Lys 


Gly 


Gly 


Asp 






115 










120 










125 


Phe 


Phe 


Gin 


Leu 


Arg 


Glu 


Asp 


Trp 


Trp 


Thr 


Ser 


Asn 


Arg 


Glu 


Thr 


Val 




130 










135 










140 








Trp 


Lys 


Ala 


Leu 


He 


Cys 


His 


Ala 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


145 










150 










155 










160 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










165 










170 








• 


175 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Val 


Pro 


Gin 


Tyr 


Leu 








180 










185 










190 




Arg 


Trp 


Phe 


Glu 


Glu 


Trp Ala 


Glu Asp 


Phe 


Cys 


Arg 


Lys 


Lys 


Lys 


Lys 






195 










200 










205 






Lys 


Leu 


Glu 


Asn 


Leu 


Gin 


Lys 


Gin 


Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 




210 










215 










220 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


*~ 1 w 


225 










230 










235 










.240 


Thr 


Asn 


Cys 


Ser 


Val 


Trp 


Cys 


Arg 


Met 


Tyr 


Glu 


Thr 


Trp 


lie 


Asp 


Asn 










245 










250 










255 




Gin 


Lys 


Lys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








260 










265 










270 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 






275 










280 










285 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




290 










295 










300 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


305 










310 










315 










320 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Xaa. 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 










325 










330 










335 




Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








340 










345 










350 






Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Cys 


Xaa 


Xaa 


Cys 


















355 










360 

















(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


1 








5 










10 










15 




Cys 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 








20 










25 










30 






Ala 


Cys 


Ala 


Pro 


Tyr 


Arg 


Arg 


Leu 


His 


Val 


Cys 


Asp 


<31n 


Asn 


Leu 


Xaa 






35 










40 










45 








Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 




50 










55 










60 










Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
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65 70 75 80 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin lie Cys Thr 

85 90 95 

Met Leu Ala Arg Ser Phe Ala Asp He Gly Asp II Val Arg Gly Arg 

100 105 no 

Asp Leu Tyr Leu Gly Asn Pro Gin Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

115 120 125 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

130 135 140 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Asn Asp Pro Glu Phe Phe Lys Leu Ara 
" 5 ISO 155 160 

Glu Asp Trp Trp Thr Ala Asn Arg Glu Thr Val Trp Lys Ala lie Thr 

1^5 170 175 

Cys Asn Ala Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa 

180 185 190 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

195 200 205 

Xaa Xaa Xaa Xaa Val Pro Gin Tyr Leu Arg Trp Phe Glu Glu Trp Ala 

210 215 220 

Glu Asp Phe Cys Arg Lys Lys Asn Lys Lys He Lys Asp Val Lys Ara 
225 230 235 240 

Asn Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa 

245 250 255 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Xaa Xaa Xaa Xaa Cys He Ser Cys Leu Tyr Ala Cys Asn Pro Tyr 

27 5 280 285 

Val Asp Trp lie Asn Asn Gin Lys Glu Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

290 295 300 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
305 310 315 320 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

325 330 335 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

340 345 350 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa 

355 360 365 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

370 375 380 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
385 390 395 400 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Cys 
405 410 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 411 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear * 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

1 5 10 15 

Xaa Xaa Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

20 25 30 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
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35 
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xaa 
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Xaa 
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Xaa 


Xaa 
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Xaa 


Xaa 


Xaa 


Xaa 
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Xaa 


Xaa 


Xaa 


Xaa 
110 


Xaa 
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Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Xaa 


Ala 


Met 


Cys 


Arg 


Ala 


Val 


Arg 


Tyr 






115 










120 










125 




Ser 
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Ala 


Asp 


Leu Gly Asp 


He 


He 


Arg 


Gly Arg 


Asp 


Met 


Trp Asp 
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Glu 
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Xaa 


Xaa 
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Thr 
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Xaa 


Xaa 


Cys 


Xaa 
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Xaa 


Xaa 


Xaa 


Xaa 


Xaa 
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He 
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Xaa 
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405 410 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 311 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: internal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa Xaa 
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1 5 10 15 

Xaa Xaa Xaa Xaa Xaa Xaa Ala Cys Met Pro Pro Arg Arg Gin Lys L u 

20 25 30 

Cys Leu Tyr Tyr lie Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
5 35 40 45 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

50 55 60 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
65 70 75 80 

10 Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Gin Phe Leu Arg Ser Met Met 

85 90 95 

Tyr Thr Phe Gly Asp Tyr Arg Asp lie Cys Leu Asn Thr Asp lie Ser 

100 105 110 

Lys Lys Gin Asn Asp Val Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
15 115 120 125 

Xaa Xaa Xaa Xaa Xaa Ser Lys Ser Pro Ser Gly Leu Ser Arjg Gin Glu 

130 135 140 

Trp Trp Lys Thr Asn Gly Pro Glu lie Trp Lys Gly Met Leu Cys Ala 
145 150 155 160 

20 Leu Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

165 170 175 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

180 185 190 

Xaa Xaa Xaa Xaa Xaa Xaa Lys Pro Gin Phe Leu Arg Trp Met lie Glu 
25 195 200 205 

Trp Gly Glu Glu Phe Cys Ala Glu Arg Gin Lys Lys Glu Asn lie lie 

210 215 220 

Lys Asp Ala Cys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys Xaa Xaa Xaa 
225 230 235 240 

30 Lys His Arg Cys Asn Gin Ala Cys Arg Ala Tyr Gin Glu Tyr Val Glu 

245 250 255 

Asn Lys Lys Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 

260 265 270 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa 
35 275 280 285 

Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Cys 

290 295 300 

Xaa Xaa Xaa Xaa Cys Xaa Cys 
305 310 



40 



(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 7 amino acids 

45 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: peptide 
50 (iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE : 

55 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34; 

Pro Arg Arg Gin Xaa Leu Cys 
1 5 

60 (2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
65 (C) STRANDEDNESS: single 
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(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL : NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
CCRAGRAGRC AARAAYTATG 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
.(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
CCSMGSMGSC AGCAGYTSTG 

(2) INFORMATION FOR SEQ ID NO: 37: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANT I SENSE : NO 

(v) FRAGMENT TYPE: N-terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 

Phe Ala Asp Xaa Xaa Asp lie 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTI SENSE : NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
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10 



20 



55 



TTTGCWGATW WWSGWGATAT 20 
(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

15 (vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
TTCGCSGATW WCSGSGACAT 20 
(2) INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 

25 (B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
30 (iii) HYPOTHETICAL : NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: N- terminal 

(vi) ORIGINAL SOURCE: 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 

Pro Gin Phe Xaa Arg Trp 

1 • 5 

40 (2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
45 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : cDNA 

(iii) HYPOTHETICAL: NO 
50 (iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CCAWCKKARR AATTGWGG I 8 
(2) INFORMATION FOR SEQ ID NO: 42: 



60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 
*(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

65 
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(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

( vi ) ORIGINAL SOURCE : 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 
CCASCKGWAG AWCTGSGG 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) F RAGMENT T YPE: N- terminal 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 

Glu Trp Gly Xaa Xaa Xaa Cys 
1 5 

(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
CAAWAWTCWT CWCCCCATTC 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: NO 

(iv) ANTISENSE: NO 

(v) FRAGMENT TYPE: 

(vi) ORIGINAL SOURCE: 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 45 



CAGWASTCST CSCCCCACTC 
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WE CLAIM : 

1. A composition comprising a nucleotide sequence of the DBL gene family, wherein said nucleotide 
sequence is selected from the group consisting of the var-1, var-2, var-3 and var-7 genes. 

2. The composition of Claim 1, wherein the nucleotide sequence of the var-1, var-2, var-3 or var-7 
5 gene encodes a cysteine-rich domain homologous to a cysteinerich domain of a Duffy Antigen Binding Protein (DABP) 

derived from Plasmodium vivax and a Sialic Acid Binding Protein (SABP) derived from Plasmodium falciparum. 

3. The composition of Claim 1, wherein the nucleotide sequence of the var-1, var-2, var-3 or var-7 
gene encodes a cysteine rich interdomain region between a first domain and a second domain. 

4. The composition of Claim 1, wherein the nucleotide sequence is derived from a coding region of 
10 SEQ ID N0:13 or SEQ ID NO: 15. 

5. A composition comprising a polypeptide encoded by a nucleotide sequence of the DBL gene family, 
wherein said polypeptide is encoded by a var-1, var-2, var-3 or gene. 

6. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
homologous to cysteine rich domains of a Duffy Antigen Binding Protein (DABP) derived f rom Plasmodium max and 

15 a Sialic Acid Binding Protein (SABP) derived from Plasmodium falciparum. 

7. The composition of claim 5, wherein the polypeptide comprises a sequence of about 300 to 400 
amino acid residues occuring in the cysteine rich interdomain region between a first domain and a second domain of 
a polypeptide encoded by the var-1, var-2, var-3 or var-7 q^m. 

8. The composition of claim 5, wherein the polypeptide comprises a sequence of amino acid residues 
20 of SEO ID NO: 14 or SEQ ID N0:16. 

9. The composition of claim 5, wherein the polypeptide comprises a sequence of about 50 to about 
325 amino acid residues of SEQ ID N0:14 or SEQ ID N0:16. 

10. The composition of claim 5, wherein the polypeptide comprises a sequence of about 75 to about 
300 amino acid residues of SEQ ID N0:14 or SEQ ID N0:16. 

25 11. The composition of claim 5, wherein the polypeptide comprises a sequence of about 100 to about 

250 amino acid residues of SEQ ID N0:14 or SEQ ID N0:16. 

12. The composition of claim 5, further comprising a pharmaceutically acceptable carrier and an 
isolated Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein (SABP) 
binding domain polypeptide, or a combination thereof, in an amount sufficient to induce a protective immune response 

30 to Plasmodium merozoites in a mammal. 

13. The composition of any of the preceding claims for use in inducing a protective immune response 
to Plasmodium merozoites in a mammal. 

14. Use of the composition of any one of claims 1-12 in the preparation of a medicament for inducing 
a protective immune response to Plasmodium merozoites in a mammal. 

35 15. A method of inducing a protective immune response to Plasmodium merozoites in a mammal, 

comprising administering to a mammal an immunologically effective amount of a pharmaceutical composition 
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comprising a pharmaceutical^ acceptable carrier and an isolated cysteinerich polypeptide encoded by a var gene 
selected from the group of genes consisting of var-1, var-2, var-3 and var-7 genes. 

16. The method of claim 15, further comprising administering to said mammal an immunologically 
effective amount of a Duffy Antigen Binding Protein (DABP) binding domain polypeptide, a Sialic Acid Binding Protein 
5 (SABP) binding domain polypeptide, or a combination thereof. 
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FIG. 3 

UNIEBP5 and 5A: P. R R q k/£ L C 

UNIEBP5, for A+T biased codon usage- 
CC(A/G)-AG(G/A)-AG(G/A)-CAA-(G/A)AA-(C/T)TA-TG 

J225? PfiA ' for G+c blased codon usage: 
CC(C/GHC/A)G(C/GHC/A)G(C/G).CAG-CAG.(C/7)T(C/G).TG 

UNIEBP5B and C: F A 0 UY G/R D I 

UNIEBP5B, for A+T biased codon usage 
TTT.GC(An)^AT.(An7(A^(AaKG/C)G(Arr).GAT^T 
UNIEBP5C. for G+C biased codon usage- 
TTC-GC(G/C)^AT-(A/T)(A/T)C-(G/C)G(G/C)-GAf>AT 
UNIEBP3 and 3A: P Q. F UF R W 

UNIEBP3. for A+T biased codon usage- 
CCA-(A/T)C(T/G)-(T/G)A(A/GHA/G)AA-TTG-(A/T)GG 

UNIEBP3A. for G+C biased codon usage- 
CCAKC/G)C(G/T>G(ArDA-GA(An>CTG.(C/G)GG 

UN1EBP3 B and C: E W G DIE DIE Y/F C 

UNIEBP3B. for A+T biased codon usage- 
CA^(An)AKA^TC^A^T)TC-(AniCC.CCA-TTC 

UNIEBP3C, for G+C biased codon usage: 
CA-G(A/T)A-(G/C)TC-(G/C)TC-(G/C)CC-CCA-CTC G+C Biased 
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